# Defining a Neural Network

**00:00**
Neural networks. We’re going to build a brain out of Python. Actually, that’s a valid statement, but it depends on the definition of “brain”. If it refers to the human brain, nothing could be further from the truth.

**00:15**
The word “neural” invokes visions of the nervous system, and as the CEO of the nervous system, the brain is often associated with neural networks. But neural networks are loosely inspired by how neuroscientists think the brain might possibly work, and that’s assuming that they are on the right track. TLDR, we can not reliably replicate the functions of a human brain in Python or any other language, and that’s because we don’t know enough about how the human brain works.

**00:45**
But on the other hand, neural networks can still be quite useful for certain tasks. For example, a common job for a neural network is the convolutional neural network, or CNN, that you will implement later in the course.

**00:58**
Neural networks are used in classification, regression, generative models, computer vision, voice recognition, and you will get hands-on with neural networks and natural language processing.

**01:12**
The loose inspiration for neural networks is based upon the structure of neurons in the human brain and how neurons are connected to each other, thus forming neural networks. Here’s a picture of a very simple neural network, and it’s likely you’ve seen a similar one.

**01:26**
The circles represent the neurons, or *nodes*, in the lines or the connections between them. Notice that the nodes are organized into layers, and this particular network has three layers.

**01:38**
The first layer is creatively referred to as the *input layer*, and this is the layer which receives input in the form of feature vectors for your NLP project. The *output layer* returns the predictions.

**01:52**
In between are zero or more intermediate or hidden layers. They are called “hidden” because we don’t touch them, unlike the input and output layers. A deep neural network is one with more than two hidden layers.

**02:08**
At the risk of getting mathematical, here is the formula for calculating the output of any single layer. Most of the value is a sum of products. Each input value *a* is multiplied by a weight *w* and summed with *b*, the bias.

**02:25**
This alone is enough to compute values in a neural network. However, there is a limitation. Notice that the weight times the input plus the bias is a linear function, so at this point, a neural network could only train models representing, or approximating, linear functions.

**02:46**
The real world is much messier than straight lines, so the goal is to be able to train a model approximating any arbitrary function. Thus, neural networks are often referred to as universal function approximators. To introduce nonlinearity, you use an activation function *f()* before calculating the final output.

**03:08**
A common activation function for hidden layers is ReLU, or the rectified linear unit. It’s essentially the linear function with negative values adjusted to 0.

**03:19**
The sigmoid function, which squashes values to between 0 and 1, is often used as the output layer for binary classification. Sentiment analysis is a form of binary classification.

**03:33**
It’s obvious that the values of the weights and bias influence the predictions. The weights and bias are known as *trainable parameters*. That’s because during training the weights and bias are continuously updated.

**03:47**
The initial values are random, and thus the first prediction for the first feature vector is going to be terribly off. However, the prediction and actual label will be given to a cost or loss function.

**04:00**
This measure of similarity will calculate how good or how bad the prediction was and send the result to an optimizer. The optimizer will adjust the trainable parameters in the model with the goal of better predictions in the future. When the optimizer updates the trainable parameters, it completes a loop that starts over with the input layer receiving the next feature vector in the data set, and this is called *backpropagation*.

**04:25**
This loop of updating the trainable parameters is what makes neural networks useful. Again, you don’t need to know the math used to make the loss function or the optimizer, but it’s good to be familiar with the steps. In the next video, you’ll see how to use Keras to realize those steps in code.

Become a Member to join the conversation.