Synthetic Data in Machine Learning: What, Why, How?

In this episode, Nicolai Baldin (CEO) and Simon Swan (Machine Learning Lead) of Synthesized are welcoming the founder of Data Science Central and Vincent Granville to discuss synthetic data generation, share secrets about Machine Learning on synthetic data, key challenges with synthetic data, and using generative models to solve issues related to fairness and bias.

Podcast on synthetic data, with CEO and Machine Learning Lead of


  • 0:00 – Introductions
  • 3:24 – How did you become interested in synthetic data?
  • 5:36 – How does the corporate world interact with synthetic data?
  • 8:31 – Problems that synthetic data can help solve
  • 18:55 – Synthetic datasets used by corporations
  • 27:55 – What is driving the interest to synthetic data?
  • 31:21 – How would you define what synthetic data actually is?
  • 38:43 – Creating and sharing high quality synthetic data
  • 41:58 – What criteria should be used to measure synthetic data?
  • 46:02 – Challenges in scaling from standalone tables to databases
  • 49:38 – Data coverage concept and its applications
  • 51:30 – Using synthetic data to help solve biases
  • 57:13 – Fire round
  • 1:00:53 – Conclusions


🎙Nicolai Baldin — Founder & CEO, Synthesized. Nicolai leads Synthesized’s rapid growth, as a top provider of DataOps tools for software testing and data science applications, across the UK, Europe and North America. Nicolai is responsible for the direction and product strategy of Synthesized. For over 8 years, Nicolai has designed and delivered complex ML solutions used by top financial and healthcare institutions. He holds a PhD in Machine Learning and Statistics from the University of Cambridge.

🎙Simon Swan — Machine Learning Lead, Synthesized. Simon contributes to the core technology of Synthesized and is responsible for some of the development processes of the ML team. Prior to joining Synthesized in 2019, he worked in the legal and medical industries as a NLP & Machine Learning engineer. He has an academic background in Statistical Thermodynamics and Computational Linguistics from the University of Cambridge.

🎙Vincent Granville — Founder, Vincent Granville is a pioneering data scientist and machine learning expert, co-founder of Data Science Central (acquired by TechTarget in 2020), former VC-funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, CNET, InfoSpace. Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS). Vincent published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of multiple books. He lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory.

About Synthetized

Who are we? Synthesized is the development framework helping companies create optimized and safe to share datasets for use in machine learning, software testing and development and analytics. Learn more about Fairlens, here. For more details, visit this page.

You can find more articles on synthetic data, here. To not miss future articles, sign-up to our newsletter, here.

Leave a Reply