In this article, I explore different front-end strategies to improve a generative adversarial network (GAN) that leads to poor synthetization, in the context of tabular data generation. It is well known that tabular data is a lot more challenging than images, when using deep neural networks for synthetization purposes. An algorithm may work very well on some datasets, and unexpectedly fails on other use cases. Here, adding one feature to a same dataset resulted in an otherwise decent GAN to behave poorly. The new feature was highly correlated to an existing one, causing the problem. I fixed it by first transforming the data via PCA (principal component analysis) before performing the synthetization, then applying the inverse transform post-GAN. Scaling and other transforms such as standardization, also help.
In addition, when dealing with tabular data, GAN may be very sensitive to the seed, and synthetizations obtained at each epoch significantly vary in quality. Thus, you could stop half-way and get much better results than using a fixed number of epochs. Small datasets present additional challenges. Better loss functions such as Wasserstein, may not solve the issue.
Evaluating the quality, that is, comparing the synthetic with the real data, is a problem of its own. In computer vision, this is straightforward as you visually compare two images. But with tabular data, many metrics fail to capture intricate patterns spanning across multiple features — some numerical and some categorical. It can result in a synthetizations scored as excellent, while in reality being a total miss. In short, a false negative.

In this article, I cover all these issues and show how to address them. In the end, the best solution, one consistently working on all datasets with barely any hyperparameter fine-tuning, was not a neural network, but an algorithm referred to as NoGAN, also running a lot faster. However, I also show several strategies in action, to significantly improve your GAN on challenging datasets. In this case, the data comes from a well known telecom use case. In the above table,
- 2D GAN corresponds to using only two features and works well (not included here).
- Failed GAN corresponds to working with three features, with the new one causing the problem.
- Fixed GAN is based on the three features, after using PCA transformation, the best seed, and the best epoch.
- NoGAN does not use any neural network. This technique is described here, along with better evaluation metrics.
The free technical paper (10 pages, including case study and full Python implementation with link to GitHub), is available as article #30, here. To not miss future articles, sign-up to our newsletter (same link).
About the Author

Vincent Granville is a pioneering GenAI scientist, co-founder at BondingAI.io, the LLM 2.0 platform for hallucination-free, secure, in-house, lightning-fast Enterprise AI at scale with zero weight and no GPU. He is also author (Elsevier, Wiley), publisher, and successful entrepreneur with multi-million-dollar exit. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. He completed a post-doc in computational statistics at University of Cambridge.