Computer Vision: Shape Classification via Explainable AI

Update: The technical report on this topic is now available in the Resources section, here. Look for the title “Classification of Shapes via Explainable AI” under Free Books and Articles.

A central problem in computer vision is to compare shapes and assess how similar they are. This is used for instance in text recognition. Modern techniques involve neural networks. In this article, I revisit a methodology designed in 1914, before computer even existed. It leads to an efficient, automated AI algorithm. The benefit is that the decision process made by this black-box system, can be explained (almost) in layman’s terms, and thus easily controlled.

To the contrary, neural networks use millions of weights that are impossible to interpret, potentially leading to over-fitting. Why they work very well on some data and no so well on other data is a mystery. My “old-fashioned” classifier, adapted to modern data and computer architectures, lead to full control of the parameters. In other words, you know beforehand how fine-tuning the parameters will impact the output. Thus the word explainable AI.

In an ideal world, one would want to blend both methods, to benefit from their respective strengths, and minimize their respective drawbacks. Such blending is referred to as ensemble methods. Also, since we are dealing with sampled points located on a curve (the “shape”), the same methodology also applies to sound recognition.

Mathematical Foundations

There is a little bit of mathematics involved here. It mostly boils down to using polar rather than cartesian coordinates, combined with rudiments of differential calculus. Familiarity with multivariate sorting also helps. Here I keep the presentation at a high level, leaving equations and technical details in a paper to be published in the next two weeks. However, the Excel spreadsheet offered in this article has most of the formula implemented.

Figure 1: Comparing two shapes

It is convenient to consider a shape as a set of points, representing the pixels of an image. The center of the image is considered as the origin of the coordinate system. The points in question – all located on the curve – are assumed to be ordered in some way (to be discussed), and indexed by t. Physicists may view the index t as the time variable (discrete or continuous), and the curve as an orbit. Complex shapes may involve multiple curves, even disconnected ones.

For illustration purposes, it is easier to start with mathematical shapes characterized by a parametric polar equation. In this case, the equation is

r_t =g(t), \quad\theta_t=h(t),  \quad \text{with}\quad  t\in T, \quad r_t\geq 0, \quad 0\leq \theta_t\leq 2\pi.

Here g, h are real-valued functions, and T is the index domain. An example with n = 20 points is as follows:

\theta_t=(t+\eta)\bmod{2\pi},\quad r_t=c+d \sin(at) \sin[b(2\pi-t)], \quad t = 2\pi k/n \text{ with } k=0,\dots, n-1.

This example is pictured in Figure 1. The parameter η controls the rotation angle or orientation of the shape. Detailed computations are in the “Data Shapes” tab, in this spreadsheet. You can play with the parameters in the “Dashboard” tab (illustrated in Figure 1) to see how the shapes get transformed.

Shape Signature

Each shape (or set of points) is uniquely described by a normalized contour on the unit square, called signature. The signature does not depend on the location or center of gravity of the shape. It depends on the orientation, though it is easy to generalize the definition to make it rotation-invariant. Or to handle 3D shapes. The first step is to use the center of gravity (centroid) for the origin, and then rescale by standardizing the variance of the radius rt.

The centroid is a weighted average of the points located on the shape. Typically, the weight is constant. However, if the points are not uniformly distributed on the shape, you may use appropriate weights to correct this artefact. This is illustrated in Figure 2, with detailed computations in the “Shape Signature” tab in my spreadsheet. My technical document, once published, explains how to do it. Note that in Figure 2, the corrected centroid, after reweighting, makes much more sense.

Figure 2: Shape signature, with weighted centroid (gray dot) versus non-weighted (orange dot), and weight distribution

Finally, replace rt by rt / (1 + rt), and θt by θt / 2π. Here the radius rt and the angle θt are measured using the centroid as the origin. The concept of shape signature is not new. See references section. Comparing two shapes amounts to comparing their signatures. However, our approach to shape comparison, in the next section, is a little different.

Comparing Two Shapes

Let us assume that the shapes are available as lists of points or more precisely, pixels. The points need to be properly ordered for comparison purposes. In the mathematical example, the index t provided a natural order — not the best one — but one that leads to a reasonable solution. Both sets of points were ordered according to the same t, and the number of points was the same for both shapes.

With real data sets, proceed as follows. First, compute a “distance” D12 from shape 1 to shape 2. Then compute a “distance” D21 from shape 2 to shape 1. The “distance” D is the minimum between D12 and D21. To compute D12, the ordering of the points does not matter. For each point P on shape 1, compute the distance D12(P, QP) to the nearest neighbor point QP on shape 2. Then D12 is the average D12(P, QP) computed over all points P in shape 1. If D12 = 0, then shape 1 is a subset of shape 2. The computation of D21 is similar. If and only if D = 0, then both sets of points are identical.

The metric D is closely related to the Hausdorff distance (first introduced in 1914), but less sensitive to outliers.

Comparison Based on Correlation

Let ρ1(P) denotes the distance between point P on shape 1, and the centroid of shape 1. If QP is the nearest point to P on shape 2, let ρ2(QP) be the distance between point QP and the centroid of shape 2. Let σ1 and σ2 be the standard deviations respectively of ρ1(P) and ρ2(QP), computed across all points P on shape 1. Now define

\lambda_{12}=\frac{1}{n_1\sigma_1\sigma_2}\sum_{P\in S_1}\rho_1(P)\rho_2(Q_P).

where n1 is the number of (sampled) points on shape 1, and S1 denotes shape 1. Clearly, 0 ≤ λ12 ≤ 1 and λ12 = 1 if and only if shape 1 is an exact subset of shape 2 (after scale standardization). Define λ21 in a similar way (swapping the roles of shape 1 and shape 2) and λ = min(λ12, λ21). Now λ measures the correlation between shape 1 and shape 2. A more useful metric is –λ log(1 – λ). Based on test data in the spreadsheet, a value above 8 means that the two shapes are quite similar, while a value below 4 means dissimilarity.

Note that λ depends on the orientation of the shapes, denoted as η1 and η2 in the spreadsheet. If orientation is irrelevant, define λ as the minimum value of λ(η1, η2). The λ computed in the spreadsheet is a simplified version not based on nearest neighbors, nor on λ12 or λ21. It still performs pretty well. The mathematical explanations are in my technical paper, to be published in two weeks. To get access to this paper once published, subscribe to our newsletter, here.

Synthetic Data

My research heavily relies on synthetic data. To test on real shapes (say, letters of the alphabet), I recommend to add synthetic data to your training set. A blend of synthetic and real data is called augmented data.

One of the benefits of synthetic data is that you can test various shape classifiers to find the best performers, given a specific type of shape. You can include shapes that are almost identical based on the mathematical parameters, to find the smallest differences that your algorithm can detect. Likewise, you can use trillions of shapes in your training set. Indeed you can use infinitely many shapes by playing with the shape equations in the spreadsheet, to assess the discriminating power of your classifier.

Note that in the “Dashboards” tab in the spreadsheet, you can add noise in the data set. The amount of noise is determined by the parameter Precision: the lower, the more noise. Download the spreadsheet here.

Figure 3: Another interesting shape (see PDF technical report)


You can find references by googling “shape comparison machine learning”, “shape matching”, or “shape signature”. Below is a small selection.

  1. Shape Descriptor / Feature Extraction Techniques. Fred Park, 2011. UCI iCAMP 2011. Available here.
  2. Shape Signature Matching for Object Identification Invariant to Image Transformations and Occlusion. Stamatia Giannarou and Tania Stathaki, 2007. Available here.
  3. Shape Matching. Kristen Grauman, 2008. University of Texas, Austin. Available here.
  4. Geometrical Correlation and Matching of 2D Image Shapes. Yu Vizilter and Sergey Zheltov, 2012. Available here.

One thought on “Computer Vision: Shape Classification via Explainable AI”

%d bloggers like this: