Convolutional Neural Networks – The Math of Intelligence (Week 4)

Articles, Blog

Convolutional Neural Networks – The Math of Intelligence (Week 4)

Convolutional Neural Networks – The Math of Intelligence (Week 4)

Hello world its Siraj, and we’re going to build a convolutional network using no libraries! I mean just Numpy… but no libraries! No Tensorflow, no PyTorch, none of it. We’re going to look at the math behind it and we’re going to build it with just Numpy for Matrix math, in Python, okay? And what it’s going to be able to do – let me just start off with this demo to start off with – what it’s going to be able to do is recognize any character that you type in, or not type in but draw in with your mouse. So you could draw a six like that, and then hit submit. It’ll start working and then it will say it’s a six. And then if you don’t want you to use a six, you could say a letter like ‘A’. Any number or letter, it’s going to be able to detect/predict. So it’s gonna be really cool because we basically we’re wrapping it into a web app using the Flask web, framework so it’s going to be super awesome. Okay, so that’s what we’re going to do today and this is our first neural network that we’re building in this course from scratch. I mean, we made one in the weekly video but this is the real, you know, hardcore convolutional network with all the layers, all the functions, everything okay? So let’s start off with what it’s inspired by: Well it’s inspired by Yann LeCun the genius, no it’s not. So Yann LeCun is a director of AI at Facebook he’s a total G. He is awesome because he was inspired by these original two guys, right here, who published a paper in I think ’68 or early 60s or 70s? But the paper was on the mammalian visual cortex, and the idea they had was – and so here’s a great image of it. Let me make it a lot bigger. This has to be a lot bigger. So the idea they had was that mammals all see in a very similar way, and that way is hierarchical. So you have a collection of cells, and these cells or neurons. And these cells cluster, and these clusters represent different features that are learned. Okay so here, in terms of Neuroscience, they call these clusters V1, V2… You know they have names for all these clusters in the brain these clusters of neurons Before IT posterior, all the neuroscience terminology, but what we need to know is that at a high level what’s happening is every time you see something a – a series of clusters or layers of neurons are being activated. Whenever you see something, whenever you detect something to be more accurate. So if I detect a dog or if you know face or whatever it’s going to be a series of layers or clusters of neurons that fire and each of these clusters are going to detect a set of features. Ok and these features are going to be more abstract the the higher up the hierarchy of clusters You can think of it as a vertical hierarchy or even a horizontal hierarchy – it doesn’t matter – but the idea is that there is a hierarchy of features. And at the start these features are very simple: They’re lines and edges, but then they get more abstract and they become shapes and then they become more complex shapes and eventually at the highest level at the highest cluster level exist the entire face or the entire dog or whatever it is and this is how the mammalian visual cortex works. And so what Yann LeCun said and his team in ’98 when they published probably the landmark paper of convolutional nets. Which is kind of arguable I guess because Kreschvy’s imagenet paper was pretty good and and I think in 2012, but anyway Yann LeCun’s a G alright. I just want to say that. *start of auto generated subtitles* He had the idea to be inspired by three things three features of the human or the Mammalian visual Cortex local connections, and that means the clusters between neurons how each neuron each set of neurons in a cluster Klux are connected to each other and they represent some set of features and then the idea of layering how these there’s a hierarchy of features that are learned and Spatial Invariants What does this mean this word spatial invariance it means that? Whenever you or I detect something whether it’s let’s say we reciting a shoe right. We see a shoe You know it’s a shoe right if it’s a yeezy if it’s a you know adidas. Whatever it is You know it’s a shoe. It could be shaped this way, or this way it could be rotated or transformed no matter how it varies We still can detect that it’s a shoe. We know it’s a shoe So we are it is the way its position is it’s spatially invariant We can still detect what it is and so those three concepts were what inspired? first of convolutional Neural networks programatic Neural Networks designed to Mimic the Mammalian Visual Cortex how cool is that it’s so cool, so How does this thing work let’s look at how this works so we have a set of layers, okay? And we’ll talk about what these layers mean, right? What is layer a layer in each case is a series? It’s a series of operations that we’re applying okay, so let’s let’s talk about this right, so we have some input image So let’s say this is the orange that’s the image and you’ll notice by the way that this image This is a convolutional network by the way. This is what we’re building Okay, you’ll notice that this image right here or this image of the convolutional Network Isn’t what you normally look at when you think of Neural network, right? You always see that image of the circles and everything connected. So why is it different for convolutional networks because Every layer in a convolutional network isn’t connected to every so every neuron in every layer isn’t Connected to every other neuron in the next layer. Why? Because that would be too computationally expensive I’ll go over that in a second, but the idea is that if you if you see here? There is a part of the image that is connected this little square of that orange and that is called the receptive field okay? I’m going to go over all this it’s going to make more and more sense you’re going to be more confused It’s going to it’s going to make more and more sense as I go further and further and depth here so so so stay with Me here, so we have a receptive field okay. That is some part of the image that we are focused on We are by focused I mean that is the part of the image that we apply a convolution operation to okay And we take that receptive field and we slide it across the image, okay? You’re going to do exactly what I’m talking about in a second I’m just going it over at a high level we slide over the image We are applying a dot product between our weight matrix at a layer and every part of that image iteratively, okay And so that the reason that they look different the convolutional networks look different is two reasons really the first reason is that not? Every neuron in each layer is connected to every other neuron in the next layer It’s only a part of that because it would be a to borrow from discrete math a combinatorial Explode to connect every single pixel value in an image to every single pixel value in the next layer of features Right, it’ll be just a huge amount so what we do instead is we Take a part of that image, and we iteratively slide over it, okay So at a high level you understand the sliding part right think of it as a flashlight Okay, think of it Think of the the filter at each layer that shines over the receptive field that box as a flashlight And you’re shining over the image, and you’re and you’re applying dot products to all of these numbers, okay, just like that, okay? I’m going to keep going into this that was just a high so you’re not supposed to understand it all yet, okay? That was that’s very high level. We’re still going Deeper. We’re going b We’re going deep okay, so check out this beautiful image right here, isn’t it beautiful. It’s very beautiful Also, you’re beautiful for watching this so thank you for watching this okay, so I? Love my pen so much seriously you guys are amazing seriously you guys are the reason I do this every week okay, so I By the way, I want to say one more thing to go on a tangent The people who subscribe to my channel no one thought they existed we are Programmers who are smart and we are also cool no one thought these people existed, but we exist, okay? We are smart and we are cool. So you are amazing, okay? Anyway back to this what this is is another way of looking at the network right? We’re just looking at different ways We’re looking at different ways so we can build a spatially invariant image in our head of what a convolutional network is like right? No matter what that image is we’re going to learn to recognize a convolutional network when we see one I’m just trying to you know metta applying this logic to what we’re learning so what happens that each layer We are applying a series of dot products between the weight matrices and the input Matrix okay, and so what happens is Let’s we’ll get a third image okay, so there’s a third image What happens is we perform a series of operations okay at each layer and so we could think of different We could think of splitting up a convolutional network into two Separate categories the first category is feature learning and that’s what’s happening at the at head of the Head to the middle to almost a tail end of the network and at the very tail end is Classification, so there’s two parts there’s the feature learning part And then there’s the classification part and to put a feature learning part what happens are three Operations over and over and over again, and we can call them Convolutional blocks let’s just call them convolutional blocks I’m coining the term so what happens is we first apply convolution, then we apply Riilu or any kind of activation and then we apply pooling and we repeat that That’s that’s a single block three operations in a single convolutional block okay? So convolution reroute Pooley repeat convolution reroute pooling repeat convolution remove pooling, okay? And usually you know you have three blocks at least I literally building inception by Google then you have 15 15 of these but you you know you have these convolutional blocks and at the very end then you flatten that output into a Smaller dimensional Vector And then you apply a fully connected layer to it so that means that you then Connect all the neurons and one layer to the next one just because we want to then Harness all of the learnings that we’ve learned so far That’s why we fully connect at the end and then we take those learnings And we squash it into a set of probability values with our last softmax function and then we take the max value all those probabilities and each of these probabilities is a probability for Specific class that it could be and we take the max value let’s say 72% as and we’ll say okay. Well 72% for Banana, and now we know it’s a banana okay, so Hopefully you get some of it, but it’s very confusing so I know we’re about to go even deeper Okay, so get ready for this. I haven’t even started yet, so I haven’t even started yet, okay, so anyway Step one so for step one We are preparing a data set of images right so when you think of an image you think of a Matrix Hopefully a matrix of pixel values if you don’t think of it that way think of it think of it that way now You’re thinking of an image as a matrix of pixels rows by Columns and each of these each of these points in the Matrix represent a pixel right between zero and 255 But it’s actually better in terms of convolutional networks to think of an image as a three-dimensional Matrix And you’re like what no, well, it’s good enough, so it’s three dimensions So the first dimension is the length of the image the second dimension is the width and the third dimension is the depth so wait What is the depth because the depth? Represents the channels and there are three channels for images red Green and blue unless you’re talking about grayscale, then there’s black then There’s you know black and white, but we’re talking about color images okay, so there are three channels And you have these dimensions for each of the channels? So these values in each of these in each of these 2d matrices for and there are three of them represent the the Amount of redness or the amount of greenness or the amount of blueness between 0 and 255? so in terms of convolutional nets We think of images as three-dimensional pixels, okay? So I wanted to say that part ok so, that’s that’s that’s what we think of our image as our input image And it has an associated label Right we’re talking about supervised learning learning the mapping between the input Data and the output label Dog image dog label learn the mapping given a new dog image. What is a label? Well, you just learned it right so and we learn it through Back propagation back propagate to update weights remember the rhyme. You know what it is. Hey I haven’t wrapped yet in the series, but I will don’t worry. It’s coming anyway, so Every image is a matrix of pixel values. We know this we know that they’re between 0 and 255 and We can use several training Data sets. They’re two really Popular ones there’s C4, and there’s cocoa, and there’s a bunch of other ones as well but basically these are huge data sets and you can find smaller versions of them and Each of these images their dogs their cars or airplanes their people whatever they all have labels for them Handmade labels by Humans which is great for us Ok so that’s that’s it. That’s step one step one is to get your training data Which is your images which are your images step two is to perform convolution now you might be asking what is convolution well I’m here to tell you that convolution is an operation that is dope as Here’s white stove because it’s not just used in computer science and machine learning It’s used in almost every field of engineering think of convolution as two paint buckets you have one paint bucket Which is red another one. Which is blue and what you do is just smear it all over yourself No, you don’t do that what you do is you take these two paint buckets? And you combine them into one paint bucket and that new paint bucket is going to be a new color Whatever that combination of colors is that’s convolution convolution is taking two separate types of Data or two Matrices and then applying and it’s an operation that combines them so you can think of convolution as synonymous to combination, okay? Then why do we apply it? Why do we say that for convolutional networks because what we’re doing is We are combining the values for each of these layers with the input Matrix So think of the input as that Matrix right and so well, it’s a three dimensional It’s a three d tensor right but we’re applying it to each of these dimensions right so three of them So just think of it as a matrix for right now, and so what we do is we take this So at each layer at each layer. There is a weight So by the way okay, so there’s a lot of interchangeable terms in machine learning and it’s easy to get confused here but I want to set the record straight for a second weight is the same as feature Matrix is the same as feature map is the same as a Filter in this case in for convolutional networks, so you’ll see these or even Kernel Kernel is a different one. There’s actually five Interchangeable terms I can see how it’s going to be confusing but if you get the basic idea of you have an input Matrix Which is your image, and then you have a set of Matrices? which are your Features that are learned you know edges shapes more abstract shapes That’s it. That’s that’s all it is Matrix Dot product Matrices that are being multiplied by Matrices all the way through that’s that’s all it is matrices that are being multiplied by Matrices Always are just a chain of them, okay So what happens for? Convolution is we take a Matrix and we multiply it by all the values in this matrix at a certain region right and so this Is what I was talking about when I was saying we have a receptive field because we don’t just multiply it all at once We multiply by a little part of it Okay, the receptive field and we slide it and we can define what that interval is that Sliding-window? I know I’m talking about without coding the coding is coming believe me the coding is coming But just check this out for a second we got it learned conceptually first So we are multiplying the feature Matrix by that input image just for every row and every Column or just multiply multiply multiply And what happens is we have this new Matrix that results the output and that output is considered the convolve feature Okay, and so what we do is we use that output as the input for them to the next layer And we repeat the process over and over and over again obviously. There’s two more parts here. There’s the Activation the Riilu and then there’s the pooling which I’ll talk about as well, but that’s the basic idea between convolution And that’s what we call it convolution because we are Combining or convolving the wave Matrix or filter or Kernel, whatever you want to call it feature map by that input we’re combining it using the help and using that output as the input for the next layer after activating it and pulling it Okay, so that’s convolution and also Right so we apply through all of those dimensions for that for that input Matrix, okay? and that gives us our activation map or feature map or filter right so many different interchangeable terms here, so Anyway, so it’s computed using the dot product so you might be thinking well, okay? I see how there is a dot product I see how there’s matrix multiplication But how does that really tell us what features there are I see you’re still making the connection probably why? Understandably why these series of major operations help us detect features. Well, here’s what happens What happens is this and here’s the great thing about Matrices and having several of them? when we Learn a filter or weight, whatever you want to call it. You know what moving forward let’s just call it filter, okay? I’m just saying let’s just call it filter moving forward for the rest of this video when we learn a filter over time by training it on mouth mouth pictures for example a Filter is going to look like this at let’s say at the first layer We we learn a filter for detecting a that looks like this right this curve right here And so what’s what the filters going to look like for detecting the specific type of curve? It’s going to be a very Sparse filter that means there’s a lot of zeros Except so there’s all these zeros except for right here. You see this 30 30 30 30 and notice that these values Represent the shake they go in this direction of a shape and so what happens is when we take this filter and perform the dot product you know we convolve it with whatever part of the mouse if it’s over a part of the mouse that Matches that feature exactly then we when we multiply all of those When we when we perform the dot product between all those values and sum them up that’s the convolution operation right there, okay? Just It’s going to be a big number, okay? So then we know that we’ve detected a feature because we’ve we’ve multiplied it sum it up and there’s a large number, and if there’s not If we multiply it loads let’s say we have that receptive field over a different part of the mouse and that that curve doesn’t exist Then it’s going to be zero right because if you look between these 30 30 30 values and that the equivalent Locations on this pixel representation of the mouse image these are zeros and so what happens when you multiply zero Bi 30u Get zero right, so that’s why it’s important to make the rest of the so the data that’s irrelevant We want it to be zero right in the in the feature maps are in the filters that we learn in the filters that we learn we want the irrelevant parts to be 0 and in the images okay, and N in the input images So I so I can actually go even more into convolution, but It’s not really necessary, but it is super dope it is super dope though This is a great blog post by the way, I definitely encourage you to read this blog post It’s linked in the notebook, but this dude Tim Tim he goes into this idea of convolution and he talks about how it’s applied to all these different engineering fields and he goes into the Formula the formula for the convolutional Theorem is what he called it is what it’s called, okay? And I’m just going to go over this at a high level But the convolution Theorem is this general theorem for discrete well? There’s a discrete version and a continuous version right discrete is if there’s you know 1 or 0 black or white you know definite? Classes that something could be or is continuous is if it could be an infinite amount of values between 0 & 1 point 5 0.25 you know 0.7 infinity in that direction But here’s the here’s the formula for it and so let me make it bigger Just really quickly, and we’ll get back to it because it’s really cool, but the convolution Theorem states that We and so in it’s a general theorem that can be applied to any any any set of problems But in terms of what’s relevant to us is called is the convolutional Theorem apply to Matrix operations, so what we can do is we can say what it what it says is It’s the input times the kernel, and it’s the dot product it’s a dot product between Two different matrices and we perform that for every value in all of those matrices and we do that for all of the values that We have and we sum them up together, and that’s what the sigma term represents, and we and we actually express that right here Right this operation right here this multiplication in summation is the same thing but it’s a more complex way of looking at it or more mathematically accurate way and also the fast fourier transform is is brought up by this and The fat Fourier Transform take some spatial Data and it converts it into fourier space Which is like a waveform and you see this a lot in your day to day life whenever you’re looking at Some sound you know you’re listening to some sound and you look at your MP3 player? And you see the wave that’s at the fourier transform happening, but I won’t go into that That’s that’s for sound in audio, but anyway, it’s really cool a blog post definitely check it out ok so back to this Until we talked about convolution now. We’re going to talk about pooling right? So what is pooling, so Whenever we apply convolution to some image what’s going to happen at every layer is we’re going to get a series of feature of So each of the weights are going to consist of multiple images and each of these images are going to be At every layer there’s going to be more and smaller images so the first few layers are going to be huge images right and Then at the next few layers are going to be more of those but they’re going to be smaller and it’s going to get Just like that okay, and then we squash it with some fully connected layer, so we get some probability values with a soft Max But anyway What pooling does is it word is it dense is it makes the Matrix the major seeds that we learn? more Dense here’s what I mean, so if you if you perform convolution between an input and a Feature Matrix or a weight Matrix or filter? It’s going to result in a Matrix right? But this Matrix is going to be pretty big it’s going to be a pretty big Matrix what we can do is we can take the most important parts of that Matrix and Pass that on and what that’s going to do is it’s going to reduce the computational complexity of our model, okay? So that’s what pooling is all about to pooling tester There’s different types of pooling max pooling is the most used type of pooling by the way It’ll basically Multiply so what happens is we strive? We have some we define some windows size and then construed size So how what are the intervals that we look at and we say okay? So for each of these windows? Let’s take the max value so for so for this one right here for 6 0 8 The max value would be 8 and so for 1 3 12 9 ob 12 right so we just take the biggest number it’s really simple actually we just take the biggest number and we just do that for all of them and that that’s what pooling is all about and so it’s going to just give us that the most relevant parts of the image if you think of these these verities values in the nD Matrix as Pixel intensities by taking the maximum intense the the pixel with the most intensity or the the highest intensity We’re getting that feature that is the most relevant you see what I’m saying. It’s a least opaque feature to use the term from image Math anyway, so we talked about pooling and we talked about we talked about activation and so now Now we talked about convolution and we talked about pooling and so now the third part is Normalization or activation so remember how I said how it would be it’s so important that Have these values that are not related to our image B0 We want them to be zero so the result is zero if the if the feature is not detected well The way we do that is using Riilu and to relive stands for rectified Linear unit it’s an activation function It’s an activation function. Okay. We use activation functions throughout, New York Neural networks, and we use them because it is You can also call them Nonlinearities because they make our model able to learn non-linear functions not just linear functions But non-linear functions so any kind of function right the universal function approximation theorem, we talked about that activation functions helped make this happen and so riilu is a specific special kind of activation function that turns all negative numbers into Zero so that’s why it’s going to make the math easier It won’t make the math break for our convolutional networks will apply reloop so basically what we do is for every single Pixel value in the in the input to this Riilu activation function We turn it if it’s a negative. We just say make a zero. It’s super simple. It will be one line of code You’ll see exactly what I’m talking about Okay, so that’s that’s those are our blocks, so that’s how our convolutional blocks work However, there is another step that I didn’t talk about that is a nice-to-have and state-of-the-art convolutional networks always use it and that’s called dropout so Geoffrey Hinton the guy who invented Neural Networks invented a feature invented a technique called Dropout And what dropout is is a good analogy is? Old people or not old people, but people who are stuck in their ways let me let me okay So what dropout does is it turns neurons on and off randomly? What do I mean by that that I mean the matrices for each weight value is converted to zero? Randomly at some layer of the network and so what happens is by doing this our network is forced to learn new Representations for the Data new Pathways that that data has to flow through it Can’t always flow through this neuron and the reason we use it is to prevent Overfitting right we want to prevent overfitting we’ve born preventing to fit to the data Think of it as you know the older you get the more and your ways of thinking you’re you are right? and so it’s harder to think of new ways of thinking right because you’re so set in some ways so a way to prevent that is to have a novel crazy experience whether it’s Skydiving or taking psychedelics or whatever it is and what that does is it creates new Pathways So you’re not so you’re kind of forced your brain is forced to make new pathways and this increases your generalization ability And you’re not so over fit That’s a very rough abstract analogy But basically dropout is not as complexify sounds dropped out can be done in three lines of code so definitely check out this Blog post as well that I’ve linked But what it does is it? Just randomly pick some neurons in a layer to set to zero right so it’s just it’s just three lines, okay And you can look at it in this notebook, right? So that’s and then our last step is probability conversion So we’ve got this huge set of values right all these little small images that are represented by this huge output Matrix And we want to take this huge set of values and make some sense out of it We want to make probabilities out of it and the way We do that is using a soft max at the end a soft max is the type of function and it looks like this This is a soft max function right here But what we do is we plug these values into the soft max function And it’s going to output a set of probability values discrete probability values for each of the classes that we’re trying to predict Okay, and then what we’ll do is given all those probability values We’ll pick the biggest one using our max the arg max function in numpy and that’s going to give us the most likely class okay, those are the seven steps of a feat a full-forward task through a convolutional network looks like that and So now you might be wondering well, okay? So how do we train this thing well using gradient descent right and one apply to Neural networks gradient gradient descent is called Back propagation exactly, I hope you got that right anyway okay? So how do we learn these Magic numbers, right? How do we learn what these weight values should be what the feature should be? Back propagation is how we do it right and so we’ve talked quite a bit about back propagation and gradient descent But I’ll do a little over it again But the idea is that we have some error that we’re computing right? This is super supervised learning we have a we have a human label right for some data So we put in a dog image or a bicycle image to look at this image to look to relate to this image here We put in a bicycle image in the bike label. We pass it through the each layer dot product dot product up you know dot product activation function pool dot product repeat repeat Softmax or squash into probability values pick the biggest one and we have some prediction value and what we do is we compare the prediction value to The out the actual value and we get an error and we take our error and we compute the partial Derivative of the error with respect to each weight value going backwards in the network okay like this Okay, and so for regression we use the mean squared error if we’re using linear regression regression and for classification we use the softmax function so remember how in the first neural network we built an in there linear regression example we use a We use mean squared error to compute the error and now we’re using the softmax So we’ll take the so we’ll take the partial derivative of the error with respect to our weights And then that’s going to give us the gradient value that we then update each of those weight values Recursively going backward in the network, and that’s how it learns what those features what the ideal feature the weight Matrix value should be But what about the other? what about the other magic numbers what about the number of neurons and the number of features and the size of those features in the Pooling window side and the windows tried well those that is an active area of research there are best practices For values that you should use for those for those hyper parameters right the tuning knobs of our network and Andrey Karpati had some great material on this he’s probably the leading source for convolutional networks right now in terms of Written content and Yeah, I mean This is an active area of research Finding out what the ideal hyper parameters for our neural network should be and we’re still learning what it should be what what what how we can get them rather than Just guessing and checking which is what we do right now. Which is kind of like you know? Not is not as optimal right so anyway last two things now We’re going to with the code when is it the time to use this well? We know it to classify images We’ve talked about that, but you can also use them to generate images and that’s For later on that’s a little more advanced But to give you a little spoiler or a little teaser in fact this is a my entra deep learning playlist You take a convolutional network you flip it and then you call it a D– convolutional network And then you can take some text and create an image out of text. How crazy is that okay? There’s also generative models where you have two networks fighting each other and you can generate new images a whole bunch of really cool Crazies and stuff you can do, but anyway when should you use a convolutional network anytime you have spatial 2d or 3D data what do I mean well obviously images are spatial the word spatial implies that the space the positioning of the data matters, so sound you can apply to sound images or text where the the The position of the text matters right because we have a flashlight or filter and we’re convolving over an image, right? But if you have some data like say customer data or if you were to just flip the rows and columns It doesn’t matter what order. They’re in. They’re still you know? They’re still features so a good rule of thumb. Is if you swap out the rows and columns of your data set and It’s just as useful like the space doesn’t matter then you don’t want to use a Cnn. It helps you do Okay, and a great and last thing the great example of using Cnn’s are for robot learning you can use a CnN for object Detection And you can use a CnN for grasp Learning and combine the two and then you can get a robot that cooks which is really cool I’ve got a great tensorflow example and a great adversarial networks example. Okay. Let’s go into the code now And so what I’m going to do is I’m going to look at the class for the convolutional network in dump I as well as the prediction class There’s two classes here okay, so these are our three inputs pickle is for saving and loading our serialized Model What do I mean pickle is pythons way of having a platform or language agnostic way of saving Data? So you can load it up later Tentacle uses it a bunch of other libraries use it as well num pies or Matrix math And we’ve got our own little Custom class for pre-processing the data because we don’t care about that part we care about the machine learning part, okay? So let’s talk about our light ocr or object optical character recognition class in our initialize function We’re going to load the weights from the pickle file And then store it And then store all the labels that we’ve loaded we’ll define how many rows and columns in an image load up our Convolutional networks using the light Cnn function with our saved weights, so assuming we’ve already trained our network We load it with the saved weights from the pickle file, and then we defined a number of pooling letters, okay? So once we have that then we can use this predict function so given some new image we’ll reshape the image so it’s in the correct size to perform the dot product between that image and the first layer of our convolutional Network And will it will we’ll put it We’ll feed it into our network and it’s going to output a prediction probability for a class and we’ll return it Okay, super-high level we haven’t even coated our cnn. That’s that’s our first class. That’s a prediction class now now we’re going to look at the convolutional Network class and what I’m going to do is I’m going to I’m going to go over the code and I’m going to Code some parts of it So now we’ll look at our convolutional network class, okay? so in our initialize function Will initialize two lists one? To store the layers that we’ve learned the the weights of each layer and then the size of the pooling area for max pooling Okay, we’ll load up our weights from our pickle file Just like this, and then we have our predict function know in our predict function That’s where the real magic is happening, right? Let’s code what this looks like so given some input x we’re going to feed it through all of these layers, right? And so what happens is we will Say okay, so the first layer is going to be a convolutional layer, okay? We’re going to define what all of these functions Look like look like but the first layer is going to be that convolutional layer will feed in that first image And we’ll say okay well
This is the first layers with a zeroth layer will say border mode equals full And I’ll talk about that part later on but that’s it for that, and so what happens is x equals this layer, okay? So that’s our first layer, and then our next layer is going to be real ooh, so we’ll say okay Now let’s apply an activation to the outputs of the previous layer, okay, and then we’ll say equal to that Okay, so we’ll set the output from the previous letter equal to the input of this player And then we keep going we’d say okay, so we’ve got another CnN. We have another convolutional layer, and we do the same thing here. We say okay Take the in output from the previous layer we’ll define what the Name of this layer is as well as a border mode which I’ll talk about the very end of this we have a border mode Which is valid and then? We say okay well, we’ll set the output of that equal to the input of this and just keep repeating now it’s time for us to apply A nother Non-linearity, so we’ll just go ahead and apply our non-linearity again, remember these are convolutional blocks oh And we also want to pool so also the order with which you can do this varies right you can do this in different ways And yeah, so I’m doing in a certain way right now You know we could change it around it would change our result but the order map the ordering within the block it can be can be different okay, so Right so we’re going to pool. It’s we’re going to pick the most relevant features from from that From that output and then we’re going to perform dropout to prevent overfitting and we’re going to say there’s going to be 0.25 percent Chance that a neuron is going to be Deactivated that will turn it off set it to zero and that’s our dropout probability value, and then now we’re getting into our our The second category of our network not the feature learning part for the classification part, and we’ll say ok so let’s flatten this layer Let’s reduce the dimensionality of all that that data so it’s something that we can then Learn from and say well, let’s put 7 equal to 7 and then we’ll say once again turn that output into our Inputs here, okay, and so then we have another dense layer We just we just keep going with our first dense layer, and that means we’re going to it’s a fully connected layer So we’re combining everything that we’ve learned Because we’re getting really close to squashing these values into a set of probability values So we want to take all of our learnings and combine them with a fully connected layer and so we’ll combine them with a fully connect layer and then We’ll squash it now with our sigmoid or no not our sigmoid our softmax function Okay, and then that’s going to give us our Output probability, and then we’re going to say well which of the probabilities do we want we want them max one, right? We want the max probability and we’ll classify it. Just like that and return that value okay That’s the highest level and so if you’re using kaos or one of these high level libraries This is all your code would look like but we’re going to do is we’re going to look at these functions as well, okay? So let’s look at these functions you’ll start off with the Convolutional layer function and have your notebook open with me as well So you could go over this the link is in the description if you don’t know now You know if you don’t know now you know so for our convolutional layer given some input image We’re going to say well We’ll store our feature maps and the bias value in these two variables features and bias Will define how big our filter or patch is going to be how many features do we want how big is our image? How many channels RGB so 3 and then how many images do we have so given those values? We’ll define a border mode so a border mode so is so when you apply full to border mode in this case it means that the filter has to go outside the Bounds of the input by filter size divided by 2 the area outside of the input is normally padded with zeros and the border mode valid is when you get an output that it’s smaller than the input because the Convolution is only computed where the input and the filter fully overlap Ok and they’ll give us different. They’ll give us different Classification results accuracy results, and it’s good to test both options. So what we’ll do is. We’ll initialize our feature Matrix for this layer as Convolve zeros it’s going to be a bunch of zeros And then we’ll say ok so for every image that we have for every feature in that image Let’s initialize a convolve image as empty and then for each channels or doing it for each of the 3 channels Let’s extract the feature from our feature map Define a channel specific part of our image and then perform convolution on our image using that given feature filter, so notice this convolve 2D function It’s where actual convolution operation is happening. This is more of a wrapper for that actual Mathematical operation so once we have that we’ll add a bias and a bias acts as our anchor for our network It’s kind of at the y-intercept. It’s kind of like a starting point for our model to exist and Then we’ll add it to our list of convolve features for this for this layer, okay? And it will return that as our feature map our set of filter values our weight Matrices and so let’s look at this convolve to the Function so in our convolve 2D function will define the tensor dimension of the image and the feature Will get a target dimension and then these two lines? perform this this Operation this convolution Theorem that we defined right here or performing a dot product between the input and the kernel or feature for for all of those weight values and then we’re summing them all up and that’s going to be our output and so the fast fourier function in numpy does this very well and so we can just use that as fft to But that’s it’s a multiplication and a summation operation Okay, and so then we have our target value and then Once we have our target value. We could say okay let’s have a starting point and an ending point and our target value is going to be within that range of What we want to return as the convolve feature right so we have some bounding box that we want to apply this to Okay, so then so we have that so what else do we have so we start off with our convolutional layer and then we had our Riilu so what is really really super Simple Riilu Riilu is just forgiving so for for some Matrix of zeros it will go through every single pixel value in the input Matrix and if it’s a negative number we just turn it into Zero that’s it. That’s real ooh okay, and then so we have we had talked about real, ooh, we’ve talked about convolution We have to talk about pooling. So what does max pooling look like so given our learned features and our images Let’s initialize our more dense feature lists as empty and so here’s what we do we’re going to we’re going to take the max values of all of those parts of the input image right so we can say we’re going to say for each image and for each feature map the By the ro the finest starting an ending point okay, which we define with our pool size hyper parameter And so for each columns. We’ve got a set of rows and columns for each image There’s a notice a lot of nesting happening here We’re going to Define start and end points for the columns as well And then we’re going to say Define a patch given our define starting and ending points There’s some some bounding box and then take the max value from that patch using NP Dot max and that patch is what moves around right? for all parts of that image And then we return that and we’re going to store all of that in our pooled features of Matrix right here And we return that as the output and that’s what we pass on in the convolutional network, okay? So that’s what max pooling is okay, so we talked about convolution Riilu max pooling and then drop out so for dropouts Right we have our probability value that we define is 0.25 and we just multiply it by the input, okay And that what that’s going to do is going to turn on or off Some part of the Matrix into so on and off I mean 0 will make it either 0 or not 0 so it will so then our data will have to learn to either Be multiplied by it or find a different pathway That’s where dropout and then we talked about drop valve and convolution Flattening dense and Softmax, so for flattening. It’s just it’s a tensor transformation We just reduce the dimensionality of the input, okay, and then for our Dense layer our dent is our fully connected layer now This is the generic layer that you would see in a fee for network input times weight And then you add a bias right which is the dot product right here. This is this is a dense layer We just take our inputs and away at a bias that means we just Perform the dot product between the full weight Matrix and the full weight Matrix instead of doing it at all the layers because that would Be way to computation computationally expensive for image data We perform it as one fully one fully connected or dense layer at the end And that’s a way for us to combine all of our learnings together so we can then promptly squash it with a soft Max function Okay, so then for our a softmax layer, and then we have classify so for our softmax layer, we will So this is the this is the formula for softmax programmatically speaking But what it does is going to output a set of probability values? And then we’ll classify those values by taking the R max the largest probability and that is our output Okay, so that is our forward pass through the network okay, and so Yes, that is our forward pass through the network So back so back propagation works pretty much the same way as I’ve talked about before several times Graded send back propagation works the same way we take the partial derivative of our error with respect to our waste and then recursively Update our weights using that gradient value that we gradient equals partial derivative equals Delta interchangeable words But here’s a great simple example right here where we after the forward pass we? Do the same thing in reverse order? So we calculate the gradient of those weights And then back and then multiply them by the previous layer and then for our Javascript portion We are taking the drawing from the user Here’s the main code for that paint window in a canvas and we are going to say Capture the mouse’s positions capture all those points in that image with an event listener And we’re going to say on paint so whenever the user Actually starts moving that painting whenever that mouse stops clicking and then the user hits the submit button will save that Snapshot of that image, and then feed that into the network and that’s our flask app We’ll define two routes one for our home and then one for that image for the network We can deploy to the web There’s a heroku app you could definitely check out the link link is in the description as well check out the notebook, and yeah That’s it. Please subscribe for more programming videos and for now. I’ve got to do a fourier transform So thanks for watching

77 thoughts on Convolutional Neural Networks – The Math of Intelligence (Week 4)

  1. Thank you Siraj.🙏🙏🙏 I Just discover your channel and all videos are very usefull. But I am very sad because i am thinking why i didnt discover you before

  2. These are great; you should do a thing about reverse-activating a character recognizer to see how the convolution layers 'see' things. (should be simple; rather than the reverse activating image generation?) and maybe you already have I just haven't seen it 🙂

  3. Thx for the clear video! From which paper did you obtain the images of the 5 layers and lines between them?

  4. A collection of stolen gifs, images and charts from all over Internet explained by an extremely annoying dude who clearly has no knowledge of the subject. BTW, you're neither smart nor cool.

  5. Jesus fucking christ you are exceptionally clear minded and eloquent. Literally every minute was perfectly done. Head and shoulders above any other AI channel on youtube with Lex Fridman's work being a close second. Thank you for being excellent at your craft, it is inspiring.

  6. Hello. Thank you for this good content. Do you know where I can find the webpage you showed in this video? Thank you.

  7. Thanks for the great content! Great pace and ability to transfer a lot of knowledge quickly.

    It would be great to do a video on a generative CNN – have you done that?

  8. The lacture would be much more convenient…if your face will not be there.. to much of overactive..

  9. Can anyone let me know what 'a' and 'b' are, and why is there an offset? Or just link me to the page, couldn't find the link to it. :/

  10. Thank you!! this video assumes a bit of knowledge for the viewer but expands on the basics of neural networks and expanding on it to include a few other concepts.
    I suggest one neural networks video from 3 blue 1 brown first, then this is the missing link between the more advanced videos out there.

  11. The kind of passion and love for the subject which is shown in your lectures proves that you are one of the best in the subject .Hats off to you for doing this great help to the whole of humanity.

  12. Helo Siraj Raval, Excellent Video.
    May I have the honour, if you kindly send me Max Pooling Function Code Just (Matlab code), I already done CNN, but just with difficiency of Max Pooling function. Thanks
    Please, send me Matlab Code on [email protected]
    best Regards

  13. What features do we set at the beginning? Random, and then they will adjusted through backpropagation?

  14. Hey how are you. You have great videos and really love your work. I have a question if you could help me! In terms of filters and convolutions, how we implement fully connected layer and how it differs from the rest?

  15. The dropout function reminds me of what psychedelics do for you… i.e. forcing you to think of familiar things in different ways.

    (I typed the comment too soon – he mentioned psychedelics too.)

  16. Hello siraj sir, will this model b able classify any set of images as in classifying malaria infected blood cells?

  17. Great job with all your video like it and auto subscribe for all watcher who want learn something…
    Can you make Video about build Darknet YOLO from the scrach? just want to understood how it compute in python…thanks
    all of this time, I use from AlexeyAB (Github), and the C language make me little hard to understand how it works Iin computation), hope learn something by translate it in 'python way'

  18. How do you decide what layer to add? I understand it can vary but what makes a good pattern of layers?

    How many layers are there? You stopped at 10, but could it have gone higher, and why did it matter if I used layer_id = 10 vs 11 or 35?

    If I wanted to define my own weights, would I have to manually "draw" the feature onto a filter matrix (e.g.; as seen with the mouse filter)? I'm assuming in the code example you used a preloaded model.

  19. wy its always india bois or yellow peeps doing this computer stuff? where white, black and brown? ai is a racist topic or whats up?

  20. lol copying code without forking them and just writing credits to at the end of readme file. very unprofessional siraj raval

  21. Can please someone help me find the image from I really want to know form what paper it come from. Unfortunately google is not a great help


  23. الي فهم حاجه هنا يتكلم انا حاسس اني جموسه تايه في بحر العلم أنا اه الي جبني هنا اصلا

  24. hi thanks for video,by using Relu activation function its transforms to non negative values right. but an image consisting of pixel values from 0 to 255, then from where we are getting negative values before Relu function

  25. I don't see the code where you train/do backpropagation. Is it possible it was not uploaded? I believe now that the idea is to used the already trained CNN and not to be able to train one ourselves.

  26. WE NEED an AI that will read scientific papers and generate postulations for the world's scientists to ponder. Especially for health and diet and biochemistry.

  27. that dropout analogy was on point!!! I have to say that your way of explaining is very natural, unlike the conventional "read from a presentation" method. Thanks!

  28. 7.22 I am spending most of my days in my research about CNNs , very much stressed about everything, and here this guy comes making everything easier to me along with making me feel beautiful with flying kisses. is not it adorable? <3 love and like for this moment please

Leave a Reply

Your email address will not be published. Required fields are marked *