Very Deep Neural Networks Explained in 40 Seconds

Very deep neural networks (VDNN) illustrated with data animation: a 40 second video, featuring supervised learning, layers, neurons, fuzzy classification, and convolution filters.

It is said that a picture is worth a thousand words. Here instead, I use a video to illustrate the concept of very deep neural networks (VDNN).

I use a supervised classification problem to explain how a VDNN works. Supervised classification is one of the main algorithms in supervised learning. The training set has four groups, each assigned a different color. The type of DNN described here is a convolutional neural network (CNN): it relies on filtering techniques. The filter is referred to, in the literature, as a convolution operator, thus the name CNN.


The purpose is to classify any new or future data point outside the training set. In practice, not the whole training set is used to build the classifier, but a subset called test set, to check performance against the control set, and fine-tune parameters. The control set consists of the training set points not in the test set. This type of design is called cross-validation.

The classifier, illustrated in the video, eventually classifies any new point outside the training set, instantly. In addition, this article also illustrates the concepts of fractal (or fuzzy) classification, and machine learning performed in GPU (graphics processing unit).


The methodology consists of three steps.

Step 1: Transforming the test set into a format suitable as input for the DNN. This may involve rescaling or some mapping (frequently, a logistic mapping) applied to the original data. In our case, the bivariate data was binned and transformed into pixel locations to fit into the video frames. The first frame of the video represents the test set, after the initial mapping.

Step 2: The transition between a frame and the next one, until no unclassified (black) pixels are left, is as follows. You apply a local filter to each pixel, to assign its color (the group it is assigned to), using a majority vote among neighboring pixels. In this example, the filter is non-linear. It is similar to a high-pass filter, or image enhancing filter typically used in signal processing. Linear filters are known as averaging or blurring filters and of no use here. Each frame in the video represents a layer of the DNN. It is called a very deep neural network, because it involves a large number of layers (hundreds, in this example).

Step 3: The frame obtained once no black pixels are left (in the middle of the video), is the output of the DNN. To classify any future point, compute its pixel location on the image using the mapping in step 1, and find which color it is assigned to.

The illustration below is a Gif image, and was obtained by converting my MP4 video into Gif format. I used the online EZGif converter to produce it. The original video can be viewed on YouTube, here. Each pixel is called a neuron in DNN terminology, and (just like in the human brain) interacts only with neighboring neurons in a given layer. Thus the name neural network.

GPU Machine Learning

Since all the machine learning apparatus is performed on images using standard filtering techniques (once the original data set is converted to an image), it is easy to run the algorithm in video memory. In other words, getting it done in the GPU – the graphics processing unit. I mention it to explain and illustrate what GPU machine learning means, to people unfamiliar with this technology.

Fuzzy or Fractal Classification

Once no black (unclassified) pixels are left, the classifier has accomplished its task. However, in my video, I added extra frames to illustrate the concept of fractal classification. The border between clusters is somewhat porous, or fuzzy. A point close to the border may be assigned to any of the two or three adjacent groups at the border. The extra frames (called layers in DNN terminology) shows the shifting border over time. It allows you to compute the probability that a point next to the border, belongs to one group or another, by looking at its shifting class assignments over time. I will describe this in more details, in an upcoming article.


In this article I explained in layman’s terms the concepts of deep neural network (DNN), convolutional neural network (CNN), convolution filter, layers and neuron of a neural network, GPU machine learning, and fuzzy classification.

The video illustration uses an unusually large number of layers (video frames), with each neuron (pixel) connected to very few other nearby neurons – the neighboring pixels. Thus, the use of the term very deep neural network or VDNN. In my example, I use only one connection per neuron. It leads to a quite granular classifier and offers a few benefits. In practice though, traditional DNN’s use much fewer layers, but neurons are connected to dozens or hundreds of other neurons. In other words, the local filter uses a much larger window.

The methodology is described in details in my new book, available here. To not miss future updates, sign-up to our newsletter, below. In an upcoming article, I will show an application to unsupervised learning, with a post-processing filter playing the role of the sigmoid mapping in a DNN. This material is already available in my new book.

One thought on “Very Deep Neural Networks Explained in 40 Seconds”

  1. Do you have some related references worth reading, regarding the topic discussed here? I’ve read something about deep clustering (or deep classification) posted here, but I still have to read that article to figure out if it is related to what I do. I also have a version for unsupervised classification (called clustering) but haven’t turned it yet into a video.

    Finally, the image bitmaps are matrices (2D arrays). If the data had 3 or more variables, you would be dealing with 3D matrices (or higher dimensions). These are called tensors.

%d bloggers like this: