eBook: Synthetic Data and Generative AI, with Applications

(3 customer reviews)

$63.00

Synthetic data is used more and more to augment real-life datasets, enriching them and allowing black-box systems to correctly classify observations or predict values that are well outside of training and validation sets. In addition, it helps understand decisions made by obscure systems such as deep neural networks, contributing to the development of explainable AI. It also helps with unbalanced data, for instance in fraud detection. Finally, since synthetic data is not directly linked to real people or transactions, it offers protection against data leakage. Synthetic data also contributes to eliminating algorithm biases and privacy issues, and more generally, to increased security.

This book is the culmination of years of research on the topic, by the author. Emphasis is on methodological aspects including real-life datasets and original contributions, and favoring simplicity. This document integrates all the material from the previous book “Intuitive Machine Learning and explainable AI”, and it also contains all but the most advanced math from the book on stochastic simulations. The author also added more recent advances with agent-based modeling, applications to terrain generation (with animated data), geospatial statistics and new interpolation methods, enhanced generative adversarial networks (GANs) compared to copula-based synthetization, as well as synthetic universes and experimental math.

Description

Synthetic data is used more and more to augment real-life datasets, enriching them and allowing black-box systems to correctly classify observations or predict values that are well outside of training and validation sets. In addition, it helps understand decisions made by obscure systems such as deep neural networks, contributing to the development of explainable AI. It also helps with unbalanced data, for instance in fraud detection. Finally, since synthetic data is not directly linked to real people or transactions, it offers protection against data leakage. Synthetic data also contributes to eliminating algorithm biases and privacy issues, and more generally, to increased security.

This book is the culmination of years of research on the topic, by the author. Emphasis is on methodological aspects and original contributions, favoring simplicity. This document integrates all the material from the previous book “Intuitive Machine Learning and explainable AI”, and it also contains all but the most advanced math from the book on stochastic simulations. The author also added more recent advances with applications to terrain generation (with animated data), synthetic universes and experimental math. The latter is an infinite source of synthetic data to build and benchmark new machine learning techniques. Conversely mathematics benefits from these techniques to uncover new insights related to the most famous unsolved math problems. The chapter on the Riemann Hypothesis illustrates this point, with new state-of-the-art research results on the subject.

Terrain generation, evolution, and morphing (video frame, chapter 13)

Topics cover generative adversarial networks (GANs), computer vision, natural language processing, tabular data, time series, geospatial and sound data, supervised classification, clustering, agent-based modeling, generative models, nearest neighbors and collision graphs, data-driven inference, prediction (all regression techniques are unified under a single, easy-to-understand method), deep neural networks, modeling without response (unsupervised regression such as circle or curve fitting), constrained optimization, copulas, and more.

The author introduces a simple alternative to XGBoost, one of the most efficient ensemble methods; it is applied to an NLP problem — categorizing and ranking articles and blog posts to predict future performance. When needed, modern or new statistical learning techniques are introduced: dual confidence regions, new test of independence, parametric bootstrap, Rayleigh test, distribution-free logistic regression, proxy estimation and minimum contrast estimators, as well as a new prime test for strong pseudo-random number generators. Several real-life datasets are discussed in detail.

About 15% of the content is well documented Python code. The code is also on GitHub, spreading across multiple top-level folders, and unified for the first time in this book. It constitutes a solid introduction to scientific computing.

Author, Publisher, Table of Contents

Version 4.11 published in April 2023. Author and publisher: Vincent Granville, Ph.D., founder of private and self-funded machine learning research lab, MLTechniques.com.

The book is available in PDF format (292 pages) with numerous, high-quality color illustrations and clickable links to fundamental concepts described on Wikipedia, if you ever need a refresher on the basics. You can view it for instance in the Chrome browser: press Ctrl-O and select the book. Access all the navigation features and follow the links in the book, with one click. To view the table of contents, list of figures and tables, bibliography, glossary and index, follow this link.

3 reviews for eBook: Synthetic Data and Generative AI, with Applications

  1. Vincent Granville

    5-star review on Amazon originally posted here by Daniel Wilson for the print version (published by Elsevier).

    “Synthetic Data and Generative AI” by Vincent Granville is a groundbreaking work that masterfully explores the intersection of synthetic data and generative models with the broader landscape of data science and artificial intelligence. Starting with foundational discussions on cloud regression and ensemble methods, and expanding into advanced realms such as explainable AI and synthetic tabular data generation using GANs and copulas, the book presents a rich tapestry of innovative techniques.

    Granville’s adept narrative is both profound and practical, seamlessly bridging theoretical advancements with their applications in real-world scenarios. The initial chapters set a high bar, introducing readers to fresh perspectives on machine learning and optimization, while subsequent sections delve into complex topics with clarity and depth, supported by practical implementations and case studies.

    This work stands as a seminal contribution to the field, not only serving as a comprehensive repository of knowledge but also acting as a catalyst for innovation in synthetic data and generative AI. Granville’s clear and engaging writing makes even the most complex concepts accessible, highlighting the transformative potential of these technologies. The book is highly recommended for a wide audience, from data scientists and machine learning practitioners to anyone intrigued by the latest developments at the cutting edge of data science and AI. Through its meticulous structure and insightful discussions, “Synthetic Data and Generative AI” inspires further exploration and innovation, making it an essential read for those seeking to push the boundaries of artificial intelligence and data science.

  2. Vincent Granville

    5-star review on Amazon originally posted here by Jimmy Blunt for the print version (published by Elsevier).

    “A comprehensive exploration of the intersection between synthetic data generation and generative artificial intelligence. The book begins with foundational topics such as cloud regression and ensemble methods, and progressively delves into advanced areas like explainable AI and synthetic data generation using Generative Adversarial Networks (GANs) and copulas.

    Granville’s approach is both profound and practical, effectively bridging theoretical advancements with real-world applications. Early chapters introduce innovative perspectives on machine learning and optimization, while later sections provide clarity on complex topics, supported by practical implementations and case studies.

    The book emphasizes the importance of numerical stability and algorithmic performance, focusing on explainable AI and interpretable machine learning. It offers simplified constructions of confidence regions without relying on traditional statistical methods, presenting alternatives to techniques like XGBoost. Granville also addresses the automation of data cleaning, advocating for straightforward solutions when feasible. Notably, the book includes dedicated chapters on synthetic data applications, such as fractal-like terrain generation and the evolution of synthetic star clusters influenced by gravitational forces.”

  3. rajiviyer (verified owner)

    In this book, Dr. Vincent Granville offers a refreshing viewpoint on Generative AI techniques, covering concepts including Machine Learning Regression, Image & Video Generation, AI-generated art, Synthetic Tabular Data Generation, and more. While the book assumes only fundamental math and analytics proficiency, readers should be prepared to invest ample attention to fully grasp the material to gain maximum benefits. Additionally, all concepts are explained through Python code conveniently accessible on GitHub.

    Overall, it serves as a brilliant & valuable resource for understanding Generative AI.

Add a review