# Introducing convolutional neural networks (ML Zero to Hero, part 3)

♪ (music) ♪ Hi, and welcome to episode three
of Zero to Hero with TensorFlow. In the previous episode,
you saw how to do basic computer vision using a deep neural network that matched the pixels
of an image to a label. So an image like this was matched to a numeric label
that represented it like this. But there was a limitation to that. The image you were looking at
had to have the subject centered in it and it had to be
the only thing in the image. So the code you wrote would
work for that shoe, but what about these? It wouldn’t be able
to identify all of them because it’s not trained to do so. For that we have to use something called
a convolutional neural network, which works a little differently
than what you’ve just seen. The idea behind
a convolutional neural network is that you filter the images before
training the deep neural network. After filtering the images, features within the images
could then come to the forefront and you would then spot
those features to identify something. A filter is simply a set of multipliers. So, for example, in this case,
if you’re looking at a particular pixel that has the value 192, and the filter is the values
in the red box, then you multiply 192 by 4.5, and each of its neighbors
by the respective filter value. So it’s neighbor above
and to the left is zero, so you multiply that by -1. Its upper neighbor is 64, so you
multiply that by zero and so on. Sum up the result, and you get
the new value for the pixel. Now this might seem a little odd, but check out the results
for some filters like this one that when multiplied over
the contents of the image, it removes almost everything
except the vertical lines. And this one, that removes almost
everything except the horizontal lines. This can then be combined
with something called pooling, which groups up the pixels in the image
and filters them down to a subset. So, for example, max pooling two by two will group the image
into sets of 2×2 pixels and simply pick the largest. The image will be reduced
to a quarter of its original size but the features can still be maintained. So the previous image after being filtered
and then max pooled could look like this. The image on the right is one quarter
the size of the one on the left, but the vertical line features
were maintained and indeed they were enhanced. So where did these filters come from? That’s the magic
of a convolutional neural network. They’re actually learned. They are just parameters
like those in the neurons of a neural network that
we saw in the last video. So as our image is fed
into the convolutional layer, a number of randomly initialized filters
will pass over the image. The results of these are fed
into the next layer and matching is performed
by the neural network. And over time, the filters
that give us the image outputs that give the best matches
will be learned and the process
is called feature extraction. Here is an example of how
a convolutional filter layer can help a computer visualize things. You can see across the top row here
that you actually have a shoe, but it has been filtered down
to the sole and the silhouette of a shoe by filters that learned
what a shoe looks like. You’ll run this code for yourself
in just a few minutes. Now, let’s take a look at the code to build a convolutional neural
network like this. So this code is very similar
to what you used earlier. We have a flattened input
that’s fed into a dense layer that in turn in fed into the final
dense layer that is our output. The only difference here is that
I haven’t specified the input shape. That’s because I’ll put a convolutional
layer on top of it like this. This layer takes the input
so we specify the input shape, and we’re telling it to generate
64 filters with this parameter. That is, it will generate 64 filters and multiply each
of them across the image, then each epoch, it will figure out
which filters gave the best signals to help match the images to their labels in much the same way it learned
which parameters worked best in the dense layer. The max pooling to compress the image
and enhance the features looks like this, and we can stack convolutional layers
on top of each other to really break down the image and try to learn
from very abstract features like this. With this methodology,
your network starts to learn based on the features of the image instead of just
the raw patterns of pixels. Two sleeves, it’s a shirt.
Two short sleeves, it’s a t-shirt. Sole and laces, it’s a shoe–
that type of thing. Now we are still looking
at just the simple image as a fashion at the moment but the principles will extend
into more complex images and you’ll see that in the next video. But before going there try out the notebook to see 