What is a Data Video, How to Produce One in Python, and Why

Watch this 6-min video, discussing one aspect of Explainable AI. It features a spectacular data video done in Python, applied to shape classification for meteorites, on synthetic data. This video is an extract from my course “Intuitive Machine Learning and explainable AI”, available here, based on the book with same title, available here.

This video shows how to produce a data video in Python. In this case, the data video features curve fitting (a generalization of regression techniques) with 250 different training sets, each fitted to the best potential ellipse, at a rate of 20 training sets per second. Each training set has 300 points. The purpose here is to estimate the shape of a meteorite, summarized by a few parameters, in order to create a taxonomy of meteorites.

The training sets, though slightly different from one video frame to the next, substantially change over time to cover a partial arc of the parent ellipse, a full arc, various eccentricities (from a circle almost to a line), various scales and orientation angles, as well as varying amounts of noise. It illustrates how to use generative models to create rich synthetic data, to produce augmented data sets to make more robust predictions. In addition, it shows how to produce confidence regions (a 2D generalization of confidence intervals) without using statistics at all, not even the concept of random variable or maximum likelihood.

While many practitioners try to fit 4 or 6 pictures in one figure to illustrate how a procedure performs on various data sets or for benchmarking purposes, this data video deals with 250 different data sets at once, providing a more compelling visualization. It also shows when the method does well, and when it does not perform so well, exhibiting small biases. And if offering a visual impact is not enough, you can try sound too, as explained in my article “the sound that data makes”, here.

The Python code — available here on my GitHub repository — and the machine learning methodology, are explained in my course and in the book. Another example to solve a classification problem with a very deep, highly sparse, fast converging neural network combined with computer vision methods, is discussed here. It is also featured in my course and the associated book.