Math for Machine Learning: 14 Must-Read Books

It is possible to design and deploy advanced machine learning algorithms that are essentially math-free and stats-free. People working on that are typically professional mathematicians. These algorithms are not necessarily simpler. See for instance a math-free regression technique with prediction intervals, here. Or supervised classification and alternative to t-SNE, here. Interestingly, this latter math-free machine learning technique was used to gain insights about a very difficult pure math problem in number theory.

Gradient descent

Math-free is a misnomer, in the sense that it still requires middle school arithmetic. But the author of these techniques — a real mathematician — considers that middle school arithmetic (the way it is taught) is not math, but instead, mechanical manipulations. However, for the majority of machine learning professionals, a good math and statistical background is required. Everyone agrees on that these days, a change compared to 10 years ago. The following books serve that purpose.

Books Focusing on the Math

The following books were published in the last 2-3 years. They rapidly gained a lot of popularity. These books were written with modern machine learning applications in mind. Usually free, they are available online or in PDF format, and have their own websites. Some have a print version, which is useful for annotations or bed reading.

1. Model-based Machine Learning. Published in 2022. This book is still being written at this time. It has a strong emphasis on probability models, with applications ranging from recommendation engines to crowd sourcing. The concepts are illustrated and explained in simple English: the book is accessible to a large audience. The author, John Winn, is a senior principal researcher at Microsoft. Read the book.

2. Mathematics for Machine Learning. Published in 2020. By Marc Deisenroth, A. Faisal and Cheng Soon Ong, Cambridge University Press. Solid math foundations are presented in an academic style (there is no code), with applications to linear regression, dimensionality reduction (Principal Component Analysis), density estimation (Gaussian mixture models) and classification (Support Vector Machines). It includes linear algebra and tensors (p x q x r matrices), Bayesian modeling and calculus, such as the math of gradient descent. Read the book.

3. Introduction to Mathematical Statistics. The most recent edition is dated 2019. By Robert Hogg, Joseph McKean and Allen Craig. Published by Pearson, 700+ pages. Pretty expensive ($100), but you can access the free PDF version here. It covers many traditional topics, including Bayesian inference, non-parametric statistics, ANOVA and bootstrap, in an academic style. The focus is on theory, but there are numerous exercises, and a little bit of R code. Definitely a solid reference if you are interested in the math behind the scene, especially for estimation techniques and core statistical inference. It does not cover the modeling aspects (regression), nor the most advanced material such as stochastic processes, Markov chains, or auto-regressive time series.

4. Deep Learning. By Ian Goodfellow, Yoshua Bengio and Aaron Courville. MIT Press, 2016. The book includes the relevant math material, including linear algebra, probability and information theory, mixture models, optimization, with of focus on applications to neural networks. Read the book.

5. Machine Learning. Lecture notes, Stanford university course CS229. Various authors, including Andrew Ng. Covers a wide range of topics, even an introduction to Python. Just to name a few: weighted least squares, logistic regression, Newton’s method, the exponential family of distributions, generalized linear models, Gaussian discriminant analysis, naïve Bayes, Laplace smoothing, generative models (GMM — Gaussian mixture modeling), principal component analysis, and the expectation maximization method. Available online.

6. Linear Algebra for Data Science, with Examples in R. By Shaina Bennett (2021). A comprehensive tutorial on matrix algebra, eigenvalues, singular value decomposition (SVD), principal component analysis (PCA) and related topics. Chapters 20 to 24 focus on clustering. Available online.

7. Introduction to Probability for Data Science. By Stanley Chan (2021). Includes MATLAB and Python code. A classic textbook on the subject. But the last chapter, “Random Processes” is the most interesting one and usually not found in such textbooks. It focuses on time series, spectral analysis, stationarity, and the auto-correlation structure. Solved exercises are illustrated with videos. Available online.

8. Interpretable Machine Learning. Authored and self-published by Christoph Molnar (2022). Interesting topics include proxy models, adversarial data, feature importance and feature interaction, pixel and feature attribution, prototype data, and the generation and use of synthetic data to create augmented data, and thus, reinforced training sets. It is less math-heavy than the previous books in my list, but it includes many modern references, for the readers curious about the math. Also, the author discusses and compares many performance metrics. Available online.

9. Probabilistic Machine Learning. By Kevin Murphy, MIT Press (2022). The level is intermediate, and it comes with Python code. There is a lot of focus on applications, especially image processing, and in particular automated character recognition, mostly digits. Bayesian methods are discussed. The author has substantial practical experience that he gained while working at Google. The book has its own webpage on GitHub, probml.ai. A draft version of the book (PDF) was available here at the time of writing. Indeed, Advanced Topics (the second volume, yet unpublished) was also available in PDF format, from here.

10. The Elements of Statistical Learning. This is the second edition (2016) of the seminal book by Hastie, Tibshirani and Friedmam from Stanford university. Many new topics have been added, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for “wide” data (more features than observations). A free copy (PDF) is available here.

Resources with Emphasis on Coding

The following resources may include some math, but typically the minimum necessary to get you started with coding and developing your applications. So, the focus is on the code.

1. StatsModels. This is a vast GitHub repository of Python code, covering generalized linear models, regression, ANOVA, time series, survival analysis, contingency tables, probability distributions, empirical likelihood, and more. It assumes knowledge of the math concepts, focusing on the code only. Available online.

2. Statistics and Machine Learning in Python. Excellent Python tutorial, full of code, also on GitHub. Covers time series, dimension reduction, regression, non-linear models, resampling, ensemble methods (bagging, boosting and stacking), gradient descent, backpropagation, multilayer perceptron (MLP), convolutional neural networks, and more. There is an interesting application: faces recognition using various learning models. The focus is on getting the student able to use code to quickly solve these problems. So there is not a lot of math, just the minimum introduction necessary to understand each technique. Available online.

3. Scikit-learn. This is the reference for the Python Scikit-learn library. It is pretty comprehensive, covering so many topics that it is hard to provide a short description. Of course, the focus is on explaining how to use the scikit-learn functions, with illustrations and sample code, and limited math. If you click on a specific topic on the front page, say “Lasso regression”, it will show a whole section of the “book”, in this case “generalized linear models”. Available online.

4. Approaching (Almost) Any Machine Learning Problem. Self-published by Abhishek Thakur, a data scientist and world’s first 4x grandmaster on Kaggle. The most recent version is dated 2021. A print version is also available for a modest fee. It covers many math-heavy topics, with an emphasis on computer vision. This book is an excellent 300-page Python tutorial. The author focuses on real problems and real data rather than math. Read more here.

For more books, check out our book section here. To not miss future articles, sign-up to our newsletter, here.

One thought on “Math for Machine Learning: 14 Must-Read Books”

%d bloggers like this: