Convolutional Neural Networks (CNN)

Learn Text Classification With Python and Keras Douglas Starnes 05:36

00:44 Here’s an illustration of a convolutional layer in action. Your text will be the input features and the filter will be applied to the first group of features. The output will be stored in a convolution.

00:57 The filter will slide a predetermined distance to the next group of features and produce an output. Notice that the groups are overlapping.

01:08 And that’s all you really need to know about how convolutional layers work because Keras abstracts the math for you. Add a Conv1D layer right before the pooling layer.

01:19 Set the number of filters to 128 and the size of the filter to 5, and then set the activation function to 'relu'.

01:27 Notice that the embedding layer is no longer using the GloVe matrix. The rest of the process is unchanged, so train the model and test it.

01:42 You’ll see that 80% is about the best it can do. Keep in mind that this is a small data set and that neural networks perform better with large data sets. However, there is one more technique that you can employ to improve performance.

01:58 You’ve actually already been exposed to this technique, called hyperparameter optimization, but without the optimization. To review, most of the emphasis so far has been on training the weights of the model.

02:11 You’ve seen those values are changed through the training process, but there are also values such as the vector length and the embedding size that are set before training.

02:21 These values that are not trained are called hyperparameters, and they still have a lot of influence on the performance of the model. Optimizing these values yourself is impractical, so you can ask Keras for help.

02:34 Keras will try different combinations of hyperparameters and tell you the best one. For this course, you’ll see how to use the scikit-learn utility RandomizedSearchCV, in addition to k-fold cross-validation, to find the best set of values for the hyperparameters. k-fold cross-validation partitions the data into chunks.

02:57 The number of chunks is the value of k. Here, you can see an example of 5-fold cross-validation. On each iteration, a different chunk—or fold—will be used for testing with the remainder used for training, so the iterations will use different combinations of training and testing data.

03:54 Since the RandomizedSearchCV class is from scikit-learn, KerasClassifier is an adapter. This class requires the model be returned from a function.

04:05 You’ve seen all the code in this function, so I won’t go over it again. You’ve also seen a vast majority of the training code. The first difference is the instance of the KerasClassifier that wraps the model, and this is where you set the keyword arguments that were provided to the .fit() method of the model in the previous exercises.

04:26 The RandomizedSearchCV class accepts the KerasClassifier and the grid. The cv keyword argument is the number of folds, or the k value.

04:37 Finally, call .fit() on the RandomizedSearchCV to start the heavy lifting.

04:45 The return value from .fit() will include the best score and the parameters producing that score so you can examine them. This can take a little while, so again, I’ll speed it up through the magic of video. Also, if you run this on Google Colab, you’ll have access to free GPUs for jobs up to 12 hours.

05:03 These GPUs aren’t the fastest, but they will accelerate this code and take much less time. The final output is stored in a file that I’ve opened on the right side.

05:15 A better approach, but still not much more than 80%. A logical conclusion is that a larger data set would yield better results. Again, convolutional neural networks and neural networks in general are intended for large data sets. Let’s wrap up this course and review what you saw.

Alam Ahmad on Sept. 11, 2022

Brilliant

Become a Member to join the conversation.