Update on March 3, 2023: Version 4.0 has been released and now replaces version 3.0 on the e-Store. It contains a new full chapter on enhanced generative adversarial networks (GANs) with comparison to copula-based methods for data synthetization, with illustrations on real-life datasets.
The book has considerably grown since version 1.0. It started with synthetic data as one of the main components, but also diving into explainable AI, intuitive / interpretable machine learning, and generative AI. Now with 272 pages (up from 156 in the first version), the focus is clearly on synthetic data. Of course, I still discuss explainable and generative AI: these concepts are strongly related to data synthetization.
However many new chapters have been added, covering various aspects of synthetic data — in particular working with more diversified real datasets, how to synthetize them, how to generate high quality random numbers with a very fast algorithm based on digits of irrational numbers, with visual illustrations and Python code in all chapters. In addition to agent-based modeling newly added, you will find material about
- GAN — generative adversarial networks applied using methods other than neural networks.
- GMM — Gaussian mixture models and alternatives based on multivariate stochastic and lattice processes.
- The Hellinger distance and other metrics to measure the quality of your synthetic data, and the limitations of these metrics.
- The use of copulas with detailed explanations on how it works, Python code, and application to mimicking a real dataset.
- Drawbacks associated with synthetic data, in particular a tendency to replicate algorithm bias that synthetization is supposed to eliminate (and how to avoid this).
- A technique somewhat similar to ensemble methods / tree boosting but specific to data synthetization, to further enhance the value of synthetic data when blended with real data; the goal is to make predictions more robust and applicable to a wider range of observations truly different from those in your original training set.
- Synthetizing nearest neighbor and collision graphs, locally random permutations, shapes, and an introduction to AI-art
Newly added applications include dealing with numerous data types and datasets, including ocean times in Dublin (synthetic time series), temperatures in the Chicago area (geospatial data) and the insurance data set (tabular data). I also included some material from the course that I teach on the subject.
For the time being, the book is available only in PDF format on my e-Store here, with numerous links, backlinks, index, glossary, large bibliography and navigation features to make it easy to browse. This book is a compact yet comprehensive resource on the topic, the first of its kind. The quality of the formatting and color illustrations is unusually high. I plan on adding new books in the future: the next one will be on chaotic dynamical systems with applications. However, the book on synthetic data has been accepted by a major publisher and a print version will be available. But it may take a while before it gets released, and the PDF version has useful features that can not be rendered well in print nor on devices such as Kindle. Once published in the computer science series with the publisher in question, the PDF version may no longer be available. You can check out the content on my GitHub repository, here where the Python code, sample chapters, and datasets also reside.
The book is available immediately after purchase with credit card. If you use Google Pay or Apple Pay (accepted on our e-store, along with standard methods), you may not need to enter your credit card information. You may request your employer to pay for your purchase and ask us for separate invoicing. You can view the PDF on your browser once downloaded, and access all the links with one click as in traditional web browsing. Using Control-O in Google Chrome allows you to view any document on your device in browser mode. This applies to this PDF as well. The document can be viewed with all standard browsers including Edge, as well as with PDF viewers. If you have any questions, feel free to contact Vincent Granville at vincentg@MLTechniques.com.