Classifier Cascades

Traditional Face Detection With Python Austin Cepalia 02:05

The definition of a cascade is a series of waterfalls coming one after another. A similar concept is used in computer science to solve a complex problem with simple units. The problem here is reducing the number of computations for each image.

To solve it, Viola and Jones turned their strong classifier (consisting of thousands of weak classifiers) into a cascade where each weak classifier represents one stage. The job of the cascade is to quickly discard non-faces and avoid wasting precious time and computations.

When an image subregion enters the cascade, it is evaluated by the first stage. If that stage evaluates the subregion as positive, meaning that it thinks it’s a face, then the output of the stage is maybe.

If a subregion gets a maybe, then it is sent to the next stage of the cascade. If that one gives a positive evaluation, then that’s another maybe, and the image is sent to the third stage.

This process is repeated until the image passes through all stages of the cascade. If all classifiers approve the image, then it is finally classified as a human face and is presented to the user as a detection.

If, however, the first stage gives a negative evaluation, then the image is immediately discarded as not containing a human face. If it passes the first stage but fails the second stage, then it is discarded as well. Basically, the image can get discarded at any stage of the classifier.

This is designed so that non-faces get discarded very quickly, which saves a lot of time and computational resources. Since every classifier represents a feature of a human face, a positive detection basically says, “Yes, this subregion contains all the features of a human face.” But as soon as one feature is missing, it rejects the whole subregion.

To accomplish this effectively, it’s important to put your best performing classifiers early in the cascade. In the Viola-Jones algorithm, the eyes and nose bridge classifiers are examples of best performing weak classifiers.

00:00 When the Viola-Jones framework is being used to detect faces, a 24 by 24 pixel subregion moves across the image to detect the presence of faces. In order to figure out if a face is present, it uses what’s called a classifier cascade.

00:17 The idea of a classifier cascade is to take a strong classifier and search this specific subregion of the image for each of its weak classifiers, one by one.

00:30 Remember, the weak classifiers that make up a strong classifier are the Haar-like features used to detect parts of the face. I know—the terminology gets kind of crazy.

00:42 We need all of the weak classifiers to be present in the subregion we’re searching if there is a face that’s present.

00:51 So, if any one of the classifiers is missing, then we can assume that this specific subregion does not contain a face and just move on. This dramatically improves efficiency because it prevents us from scanning for all of the other features in a strong classifier if we already know that one of them is missing. Think of it like this: say the strong classifier we are using is made up of three features—the eyes, the nose, and the mouth.

01:20 If we start our search by looking for the eyes and we don’t find them, then what’s the point of searching for the nose and the mouth? We already know that the face does not exist here, so let’s move on to the next subregion and keep scanning. In order to accomplish this effectively, it’s important that we put our best-performing classifiers, aka the ones with the highest weight, early in our cascade.

01:45 And that is basically the entire Viola-Jones object detection framework, minus a whole bunch of math that would make this a lot more confusing—and frankly, pretty boring. Now for the fun part.

01:58 Let’s use this framework along with Python to detect faces within an image.

Pygator on Sept. 14, 2019

Very good series on a machine learning topic of interest for many reasons.

justintylerfarias on April 26, 2020

when discussing adaboost is the strong classifier just an object identified in a portion of the image or is an image itself a strong classifier for training when talking about the cascade it sounds like its just a part of an image im having trouble grasping adaboost

Become a Member to join the conversation.