
Subtitled “A Guide for Making Black Box Models Explainable”. Authored and self-published by Christoph Molnar, 2022 (319 pages). This is actually the second edition, the first one was published in 2019. According to Google Scholar, it was cited more than 2,500 times. So this is a popular book about a popular topic.
General Comments
The book appears as a collection or little encyclopedia of various methods and model performance metrics. Too many in my opinion, and you can easily get lost in this ocean of material. The author favors exhaustivity over selectivity. However, many will see this as a benefit. In my opinion, the book is better suited for machine learning developers than decision makers or stakeholders.
Each method or metric is compared to others, with pluses and minuses, and comes with very recent references and Python or R libraries. Applications based on real or “prototype” data, with source code, is available on the author’s GitHub repository. A glossary would help a lot, and this may be available in a future version of this book, which is constantly evolving.

I wish the quality of the print version was higher. Many color illustrations would benefit from being printed on better paper that does not absorb ink so much. I would have preferred to buy a PDF version with clickable links (text highlighted in red in the print version), if it was available. But the book has been thoroughly copy-edited and has been reviewed with the help of numerous readers. So the quality of the content and proofreading is high. The level and amount of mathematics is correct: not too much, not too advanced, but enough to have a real feeling of what the techniques do.
About the Content
Many people still wonder how you make black box systems interpretable. There are a few themes in the book to address this issue. Below is a short list that caught my attention:
- Proxy models: a simplified version of your system, simple enough that it is interpretable. The proxy model acts as an interpretable approximation of the full version.
- Adversarial data: data specifically chosen or designed to make your system fail (a weird human face that your system detects as non-human, or a weird rock that your system erroneously classifies as human). This helps you understand where your black box system shines, and its limitations.
- Feature importance and feature interaction: to further understand the mechanics that make your system works. It is more powerful than looking at cross-correlation tables.
- Pixel and feature attribution: understanding which pixels in a given image have the biggest impact to classify the image or for pattern recognition (that is, the biggest impact on the output of your system).
- Prototype data: it can be a large set of hand-written digits, added to your training set, if your problem is to recognize digits. This again helps you understand where your system shines or fails, offering some insights on its inner workings.
You can find Christoph’s book on his GitHub repository, here, or on his website. It is written in R Markdown, and published with BookDown.org. I like the GitHub version better than the print edition (it is also more up-to-date).
For related articles on interpretable machine learning, visit this page. My most recent article describes how to generate synthetic data, to use as augmented training sets in black box systems. It is available here. A lot more is described and used in my new book. To not miss future articles, sign-up to our newsletter, here.