# Lecture 11: Introduction to TensorFlow

![](https://www.tensorflow.org/images/colab_logo_32px.png)
[Run in colab](https://colab.research.google.com/drive/1H8iqFsQn9FuoNregKha7MJAQVp_eT77-)

In [1]:
import datetime
now = datetime.datetime.now()
print("Last executed: " + now.strftime("%Y-%m-%d %H:%M:%S"))

Last executed: 2024-01-10 00:20:45


## Overview of TensorFlow

[TensorFlow](https://www.tensorflow.org/) is an open source library developed by Google for numerical computation. It is particularly well suited for large-scale machine learning.    

TensorFlow is based on the construction of *computational graphs*. It has evolved considerably since it's open source release in 2015.  We will use TF2, which offers many additional features built on top of core features (the most important is `tf.keras` discussed in later lectures).

In [2]:
import numpy as np
import tensorflow as tf
from tensorflow import keras

2024-01-10 00:20:46.295156: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-10 00:20:46.707194: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-10 00:20:46.709789: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.




### Features 

- Similar to [`numpy`](https://numpy.org/doc/stable/) but with GPU support.
- Supports distributed computing.
- Includes a kind of just-in-time (JIT) compiler to optimise speed and memory usage.
- Computational graphs can be saved and exported.
- Supports autodiff and provides numerous advanced optimisers.

### TensorFlow's Python API

<br>

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture11_Images/tensorflow-Python-API.png" width="700px" style="display:block; margin:auto"/>

[Credit: Geron]

### TensorFlow's Architecture

<br>

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture11_Images/tensorflow-Architecture.png" width="700px" style="display:block; margin:auto"/>

[Credit: Geron]

At lowest level TensorFlow is implemented in C++ so that it is highly efficient.

We will focus solely on the python TensorFlow interfaces (typical approach).  Most of the time you will simple need to interact with the Keras interface but sometimes you might want to use the low-level python API for greater flexibility.

### Hardware

One of the factors responsible for the dramatic recent growth of machine learning is advances in computing power.  

In particular, hardware that supports high levels of parallelism.

<br>

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture11_Images/cpu_gpu_tpu.png" width="750px" style="display:block; margin:auto"/>

- Central Processing Unit (CPU):
  - General purpose 
  - Low latency 
  - Low throughput
  - Sequential
  
- Graphics Processing Unit (GPU)
  - Specialised (for graphics initially)
  - High latency 
  - High throughput
  - Parallel execution
  
- Tensor Processing Unit (TPU)
  - Specialised for matrix operations
  - High latency
  - Very high throughput
  - Extreme parallel execution

In TensorFlow many operations are implemented in low-level kernels, optimised for specific hardware, e.g. CPUs, GPUS, or TPUs.

TensorFlow's execution engine will ensure operations are run efficiently (across multiple machines and devices if set up accordingly).

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture11_Images/tensorflow-Architecture.png" width="700px" style="display:block; margin:auto"/>

[Credit: Geron]

#### Aside: chips optimised for machine learning are an active area of development

Google developed TPU.

[Graphcore](https://www.graphcore.ai/) have developed the Intelligence Processing Unit (IPU).

### Computational graphs

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture11_Images/computational_graph_simple.png" width="750px" style="display:block; margin:auto"/>

[Credit: Geron]

User code constructs the computational graph (can be constructed in Python).  With TensorFlow 2, graph construction is less explicit and much simpler.

TensorFlow takes computational graph and runs it efficiently via optimized C++ code.

### Parallel and distributed computation

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture11_Images/computational_graph_hpc.png" width="750px" style="display:block; margin:auto"/>

[Credit: Geron]

Computational graphs can be broken up into different chunks, which are then run in parallel across many CPUs/GPUs/TPUs (or highly distributed systems).

This approach allows TensorFlow to scale to big-data.

### Scaling to big-data

For example, TensorFlow can be used to train neural networks with millions of parameters and training sets with billions of training instances.

Provides the infrastructure behind many of Google's large-scale machine learning products, e.g. Google Search, Google Photos, ...

## Tensors and operations

TensorFlow API centers around "Tensors" (essentially multi-dimensional arrays of matrices), hence its name.

Similar to numpy [`ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html).

### Tensors

Can construct constant tensors with `tf.constant`.

In [3]:
tf.constant([[1., 2., 3.], [4., 5., 6.]]) # 2x3 matrix

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [4]:
tf.constant(42) # scalar

<tf.Tensor: shape=(), dtype=int32, numpy=42>

Tensors have a shape and data type (dtype).

In [5]:
t = tf.constant([[1., 2., 3.], [4., 5., 6.]])
t.shape

TensorShape([2, 3])

In [6]:
t.dtype

tf.float32

### Indexing

Tensor indexing is very similar to numpy.

In [7]:
t[:, 1:]

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>

In [8]:
t[..., 1,  tf.newaxis]

<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>

### Operations

Variety of tensor operations are possible.

In [9]:
t + 10

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>

In [10]:
tf.square(t)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

In [11]:
t @ tf.transpose(t) # matrix multiplication

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

### Using `keras.backend`

Keras API also includes its own low-level API with similar functionality, which is basically a wrapper for the corresponding TensorFlow operations (more on Keras in next lecture).

In [12]:
from tensorflow import keras
K = keras.backend
K.square(K.transpose(t)) + 10

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[11., 26.],
       [14., 35.],
       [19., 46.]], dtype=float32)>

### Tensors and Numpy

**Note:** From `tf.__version__ == 2.4.0` tensorflow.numpy functionality will be added: https://www.tensorflow.org/api_docs/python/tf/experimental/numpy

Can create a tensor from ndarray.

In [13]:
a = np.array([2., 4., 5.])
tf.constant(a)

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

Can convert ndarray to tensor.

In [14]:
t.numpy()

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

Can apply numpy operations to tensors and vice versa.

In [15]:
np.array(t)

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

In [16]:
tf.square(a)

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>

In [17]:
np.square(t)

array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)

### Conflicting Types

TensorFlow does not perform type conversions automatically since they can significantly degrade performance and can easily go unnoticed.

Therefore you cannot add a float to an integer.

In [18]:
try:
    tf.constant(2.0) + tf.constant(40)
except tf.errors.InvalidArgumentError as ex:
    print(ex)

cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a int32 tensor [Op:AddV2] name: 


Similarly, you cannot add a float (32 bit) and a double (64 bit).

In [19]:
try:
    tf.constant(2.0) + tf.constant(40., dtype=tf.float64)
except tf.errors.InvalidArgumentError as ex:
    print(ex)

cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:AddV2] name: 


If you want to consider operations with different types you need to explicitly cast them first.

In [20]:
t2 = tf.constant(40., dtype=tf.float64)
tf.constant(2.0) + tf.cast(t2, tf.float32)

<tf.Tensor: shape=(), dtype=float32, numpy=42.0>

### Variables

Previous tensors we've considered are constant and immutable so they cannot be changed.

We also need tensors that can act as variables that can change over time, for example for weights of a neural network that are regularly updated during training.

In [21]:
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

Can be modified in place using the `assign` method.

In [22]:
v.assign(2 * v)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

## TensorFlow Functions

Once TensorFlow has constructed a computational graph, it optimises it (e.g. simplying expressions, pruning unused nodes, etc.).

Consequently, a TensorFlow function will typically run a lot faster than an equivalent numpy function.

`tf.function` can be used to turn a python function into a TensorFlow function.

In [23]:
def cube(x):
    return x ** 3

In [24]:
cube(2)

8

In [25]:
tf_cube = tf.function(cube)
tf_cube

<tensorflow.python.eager.polymorphic_function.polymorphic_function.Function at 0x7f00e437e1c0>

In [26]:
tf_cube(2)

<tf.Tensor: shape=(), dtype=int32, numpy=8>

In [27]:
tf_cube(tf.constant(2.0))

<tf.Tensor: shape=(), dtype=float32, numpy=8.0>

When you write custom functionality with a Keras model, Keras will automatically convert your function to a TensorFlow function so typically you will not need to worry about this.

**Exercises:** *You can now complete Exercise 1 in the exercises associated with this lecture.*

### Reuse

A TensorFlow function generates a new graph for each unique set of input shapes and data types.  The graph is then cached for subsequent use.

This is only the case for tensor arguments.

If you pass numerical python values a new graph will be created for each execution.  This could considerably slow down your code and may use up a lot of RAM (for the storage of many computational graphs).

## Gradients

As we have seen when considering training, we often need to compute the gradients to train models, e.g. for gradient descent based approaches.  Typically we need to compute the gradient of the cost function with respect to the model weights.  

TensorFlow supports automatical differentiation, which allows gradients to be computed automatically.

Consider the following function.

In [28]:
def f(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

We will compute gradients analytically, numerically and using TensorFlow's Autodiff functionality at the following point.

In [29]:
w1, w2 = 5.0, 3.0

### Computing gradients analytically

In [30]:
def df_dw1(w1, w2):
    return 6 * w1 + 2 * w2
def df_dw2(w1, w2):
    return 2 * w1

In [31]:
df_dw1(w1, w2)

36.0

In [32]:
df_dw2(w1, w2)

10.0

### Computing gradients numerically

Compute the gradient by finite differences.

In [33]:
eps = 1e-6
(f(w1 + eps, w2) - f(w1, w2)) / eps

36.000003007075065

In [34]:
(f(w1, w2 + eps) - f(w1, w2)) / eps

10.000000003174137

Gradients computed are approximate.

Required an extra function evaluation for every gradient.  Computationally infeasible for many cases, e.g. large neural networks with hundreds of thousands or millions of parameters (or more!).

### Computing gradients with Autodiff

Autodiff builds derivatives of each stage of the computational graph so that gradients can be computed automatically and efficiently.

In [35]:
w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)

gradients = tape.gradient(z, [w1, w2])

In [36]:
gradients

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

Only requires one computation regardless of how many derivatives need to be computed and result does not suffer from any numerical approximations (only limited by machine precision arithmetic).

#### Persistence

Tape is erased immediately after call to `gradient` method.  So will fail if you try to call it twice.

In [37]:
with tf.GradientTape() as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1)
try:
    dz_dw2 = tape.gradient(z, w2)
except RuntimeError as ex:
    print(ex)

A non-persistent GradientTape can only be used to compute one set of gradients (or jacobians)


Can make the tape persistent if you need to call it more than once.  Then be sure to delete it once done to free resources.

In [38]:
with tf.GradientTape(persistent=True) as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1)
dz_dw2 = tape.gradient(z, w2) # works now!
del tape

In [39]:
dz_dw1, dz_dw2

(<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>)

#### Computing gradients wrt variables and watched tensors

The tape only tracks variables (recall constants are immutable so it does not make sense to compute a gradient with respect to a constant).

If you try to compute the gradient with respect to (wrt) anything other than a variable you will get a None result.

In [40]:
c1, c2 = tf.constant(5.), tf.constant(3.)
with tf.GradientTape() as tape:
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2])
gradients

[None, None]

But you can `watch` tensors and then compute gradients with respect to watched tensors as if they were variables.

In [41]:
with tf.GradientTape() as tape:
    tape.watch(c1)
    tape.watch(c2)
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2])
gradients

[<tf.Tensor: shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=10.0>]

#### Stopping gradients propagating

Sometimes you may want to stop gradients propagating through the computational graph.

This can be performed with `tf.stop_gradient`, which allows the function to be evaluated in the forward evaluation pass but not in the reverse gradient pass.

In [42]:
def f(w1, w2):
    return 3 * w1 ** 2 + tf.stop_gradient(2 * w1 * w2)

with tf.GradientTape() as tape:
    z = f(w1, w2)

tape.gradient(z, [w1, w2])

[<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, None]

**Exercises:** *You can now complete Exercise 2 in the exercises associated with this lecture.*