A 90% confidence region is a domain of minimum area, containing 90% of the mass of a distribution. By distribution, here I mean a bivariate probability distribution, though the concept is not specific to machine learning. The 90% is called the *confidence level*, and I denote it as *γ*. Confidence regions are a generalization of confidence intervals, to two dimensions. They are typically represented using contour maps.

One may argue that ellipses (a particular case of quadratic functions) are the simplest generalization of linear functions, thus their widespread use. But here, there is a much deeper reason. And it is much easier to understand than you think. Many statisticians take it for granted that it should be an ellipse, but I never found a real justification. This article fills this gap. I discuss the elliptic case first, and then provide a non-elliptic example.

## The Shape of a Confidence Region

While this is nowhere mentioned in the statistical literature, it makes sense to assume that the confidence region is of minimum area. Determining the shape is then a variational problem. Such problems are solved using mathematical methods of functional analysis and calculus of variations. It involves functional, differential and integral equations. These topics are rather advanced.

The most famous example is the brachistochrone problem: determining the curve of fastest descent between a point A and a lower point B, for a bead smoothly rolling downhill due to gravity alone. The problem was posed by Johann Bernoulli in 1696. The solution is illustrated below. Contrarily to popular belief, the straight line is not the fastest past, it is actually the slowest one.

Interestingly, finding the shape of a confidence region of minimum area, is perhaps the most elementary in this class of problems. Think of a bivariate bell curve. A confidence region of minimum confidence level (0%) is reduced to a point. As you increase the confidence level, the region expands. It must expands as fast as possible (as a function of the confidence level) in order to be of minimum area at all times. Thus it starts as a point at the maximum of the density, and expands downwards following contour lines at all times. In other words, the boundary of a confidence region is a contour line of the underlying density.

In mathematical terms, if H(*x*, *y*) is the probability density and *γ* the confidence level, the boundary of a confidence region of level *γ* is defined by the contour line H(*x*, *y*) *= G _{γ}*. Here

*G*is a function of

_{γ}*γ*chosen so that the volume under the density, delimited by the contour line in question, is equal to

*γ*. The details (with illustrations, simulations, and spreadsheet computations) can be found in Exercise 28 and in section 3.1 in my new book, available here.

## Elliptic Shape

In many bivariate statistical estimation problems, due to the central limit theorem, the parameter estimators asymptotically have a Gaussian distribution. That is, the limiting probability density (when the sample size is large) is the exponential of a negative bivariate quadratic function. Since the exponential function is monotonic, one can take the logarithm instead, and still preserves the shape of the confidence region, and the one-to-one mapping between *γ* and *G _{γ}*. Then the boundary of the confidence region is determined by the quadratic function in question. Thus, it is an ellipse! See figure below.

## Non-elliptic Shape

To be more precise, the boundary of a confidence region has the general form H(*x*, *y*, *p*, *q*) = *G _{γ}*. Note that I added

*p*,

*q*in the equation: typically these are the estimated values of your two parameters. The probability density actually depends on these parameters. In a number of cases, they represent the expectation and variance.

In my new book (see here), I introduced the concept of *dual confidence region*, in section 3.1. It is also briefly explained in this article and will be the topic of an upcoming article, along with minimum contrast estimators. Sign-up to our newsletter to not miss these upcoming articles. In a nutshell, dual confidence regions are obtained by swapping (*x*, *y*) and (*p*, *q*) in H(*x*, *y*, *p*, *q*). They are more intuitive. The resulting confidence region is no longer an ellipse. But in practice, it is still very close to an ellipse.

Now if your bivariate probability density has multiple modes — for instance you are dealing with a mixture of distributions — then of course confidence regions are not at all an ellipse. See illustration below, featuring various contour lines (that is, confidence regions) attached to a bimodal density.

The above plot was produced using Mathematica, with the following code:

```
Plot3D[Exp[-(Abs[x]^3.5 + Abs[y]^3.5 )] +
0.8*Exp[-4*(Abs[x - 1.5]^4.2 + Abs[y - 1.4]^4.2 )], {x, -2, 3},
{y, -2, 3}, MeshFunctions -> {#3 &}, Mesh -> 25,
Exclusions -> None, PlotRange -> {Automatic, Automatic, {0, 1}},
ImageSize -> 600]
```

In this example, depending on the confidence level, the confidence region consists of two, non-connected sets.

Someone replying to one of my posts wrote the following: “That’s basically the definition of a Bayesian HPD (highest posterior density) interval. But that is *not* the same thing as a traditional confidence interval (or region).” I guess he was referring to my dual confidence regions. Anyway, food for thoughts…