New Interpolation Methods for Data Synthetization and Prediction

Entitled “New Interpolation Methods for Synthetization and Prediction”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. This article is an extract from my book “Synthetic Data and Generative AI”, available here.

I describe little-known original interpolation methods with applications to real-life datasets. These simple techniques are easy to implement and can be used for regression or prediction. They offer an alternative to model-based statistical methods. Applications include interpolating ocean tides at Dublin, predicting temperatures in the Chicago area with geospatial data, and a problem in astronomy: planet alignments and frequency of these events. In one example, the 5-min data can be replaced by 80-min measurements, with the 5-min increments reconstructed via interpolation, without noticeable loss. Thus, my algorithm can be used for data compression.

The first technique has strong ties to Fourier methods. In addition to the above applications, I show how it can be used to efficiently interpolate complex mathematical functions such as Bessel and Riemann zeta. For those familiar with MATLAB or Mathematica, this is an opportunity to play with the MPmath library in Python and see how it compares with the traditional tools in this context. In the process, I also show how the methodology can be used to generate synthetic data, be it time series or geospatial data.

Depending on the parameters, in the geospatial context, the interpolation is either close to nearest-neighbor methods, kriging (also known as Gaussian process regression), or a truly original and hybrid mix of additive and multiplicative techniques. There is an option not to interpolate at locations far away from the training set, where regression or interpolation results may be meaningless, regardless of the technique used. Another application is detecting the full extent of an oil field after digging only a dozen wells. Likewise, the temperature data sets also has few stations with an actual measurement, and the goal is to obtain interpolated values fully covering a specific area.

The second technique is based on ordinary least squares — the same method used to solve polynomial or multivariate regression — but instead of highly unstable polynomials leading to overfitting, I focus on generic functions that avoid these pitfalls, using an iterative greedy algorithm to find the optimum. In particular, a solution based on orthogonal functions leads to a particularly simple implementation with a direct and elegant solution.

Table of Contents

  1. Introduction
  2. First method
    . . . Example with infinite summation
    . . . Applications: ocean tides, planet alignment
    . . . Problem in two dimensions
    . . . Spatial interpolation of the temperature dataset
  3. Second method
    . . . From unstable polynomials to robust orthogonal regression
    . . . Using orthogonal functions
    . . . Application to regression
  4. Python code
    . . . Time series interpolation
    . . . Geospatial temperature dataset
    . . . Regression with Fourier series
Temperature map in the Chicago area: real data (round dots) blended with synthetic data

Download the Article

The technical article, entitled New Interpolation Methods for Synthetization and Prediction, is accessible in the “Free Books and Articles” section, here. It contains links to my GitHub files, to easily copy and paste the code. The text highlighted in orange in this PDF document are keywords that will be incorporated in the index, when I aggregate all my related articles into books about machine learning, visualization and Python, similar to these ones. The text highlighted in blue corresponds to external clickable links, mostly references. And red is used for internal links, pointing to a section, bibliography entry, equation, and so on.

To not miss future articles, sign-up to our newsletter, here.

About the Author

Vincent Granville is a pioneering data scientist and machine learning expert, co-founder of Data Science Central (acquired by  TechTarget in 2020), founder of MLTechniques.com, former VC-funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS).  

Vincent published in Journal of Number TheoryJournal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of multiple books, including “Intuitive Machine Learning and Explainable AI”, available here. He lives  in Washington state, and enjoys doing research on spatial stochastic processes, chaotic dynamical systems, experimental math and probabilistic number theory.

%d bloggers like this: