In this video, Vincent talks about how synthetic data can be leveraged across various industries to enhance predictions and test blackbox systems leading to more fairness and transparency in AI. Hosted by Victor Chima, co-founder at Learncrunch.com. Topics discussed include:
- How is synthetic data different from simulated data
- How to create high quality synthetic data
- How to measure quality, and why popular metrics (Hellinger score) should be avoided
- How synthetic data contributes to bias reduction and explainable AI
- Best practices to generate synthetic data, avoiding pitfalls
- Overview of current techniques: GAN, GMM, copulas, agent-based modeling, noise injection
- Applications: computer vision, time series, financial data, NLP, tabular data
- Synthetic data for benchmarking, data augmentation, imputation, and building confidence regions
- Case study: the Kaggle insurance dataset
About the Speaker
Vincent Granville created Data Science Central (acquired by TechTarget), one of the most popular online communities for Data Science and Machine Learning. He spent over 20 years in the corporate world at Microsoft, eBay, Visa, Wells Fargo, and others, holds a Ph.D. in Mathematics and Statistics, and is a former post-doc at the University of Cambridge. He is now CEO at MLTechniques.com, a private research lab focusing on machine learning technologies, especially synthetic data and explainable AI.
Vincent published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He authored multiple books, including “Synthetic Data and Explainable AI”, available here. He lives in Washington state, and enjoys doing research on spatial stochastic processes, chaotic dynamical systems, experimental math and probabilistic number theory.