
Synthesizing Multi-Table Databases: Model Evaluation & Vendor Comparison
Synthesizing multi-table tabular data presents its own challenges, compared to single-table. When the database contains date columns such as transaction or admission date, a frequent occurrence in real-world datasets, generating high quality synthetizations and model evaluation are even more complicated. In this article, we focus on this type of problems, comparing generated observations produced by […]
Read MoreNew Book: State of the Art in GenAI & LLMs — Creative Projects, with Solutions
With 23 top projects, 96 subprojects, and 6000 lines of Python code, this vendor-neutral coursebook is a goldmine for any analytic professional or AI/ML engineer interested in developing superior GenAI or LLM enterprise apps using ground-breaking technology. This is not another book discussing the same topics that you learn in bootcamps, college classes, Coursera, or […]
Read MoreProbabilistic Nearest Neighbor Search: The Swiss Army Knife of GenAI
ANN — Approximate Nearest Neighbors — is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, […]
Read MoreSynthesizing Geospatial Data with A Simple NoGAN Technique
If you regularly read my articles, you know that I developed several different techniques for data synthetization. Many are explained in details in my upcoming book Synthetic Data and Generative AI (Elsevier), available here. It includes generative adversarial networks (GANs), copulas, agent-based modeling, methods based on interpolation, correlated noise mixtures, and more. The technique presented […]
Read MoreSound Generation in Python: Turning Your Data into Music
Not long ago, I published here an article entitled “The Sound that Data Makes”. The goal was turning data — random noise in this case — into music. The hope was that by “listening” to your data, you could gain a different kind of insights, not conveyed by visualizations or tabular summaries. This article is […]
Read MoreA Synthetic Stock Exchange Played with Real Money
Not only that, but you can predict — more precisely compute with absolute certainty — what the value of any stock will be tomorrow. Transaction fees are well below 0.05% and the market, at least in the version presented here, is fair: in other words, a zero-sum game if you play by luck. If instead […]
Read MoreIntroduction to Discrete Chaotic Dynamical Systems
Entitled “Introduction to Discrete Chaotic Dynamical Systems”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. This article is an extract from my book “Gentle Introduction to Chaotic Dynamical Systems”, available here. This is chapter 2 of my upcoming book on dynamical systems and related stochastic processes, expected to be […]
Read MoreRandom Walks, Brownian Motions, and Related Stochastic Processes
Entitled “Random Walks, Brownian Motions, and Related Stochastic Processes”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. This article is an extract from my book “Gentle Introduction to Chaotic Dynamical Systems”, available here. In about 15 pages, this scratch course covers a lot more material than expected in such […]
Read MoreNew Interpolation Methods for Data Synthetization and Prediction
Entitled “New Interpolation Methods for Synthetization and Prediction”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. This article is an extract from my book “Synthetic Data and Generative AI”, available here. I describe little-known original interpolation methods with applications to real-life datasets. These simple techniques are easy to implement and can […]
Read MoreNew Book: Synthetic Data and Generative AI
Synthetic data is used more and more to augment real-life datasets, enriching them and allowing black-box systems to correctly classify observations or predict values that are well outside of training and validation sets. In addition, it helps understand decisions made by obscure systems such as deep neural networks, contributing to the development of explainable AI. […]
Read More