The objective is two-fold. First, I introduce a 2-parameter generalization of the discrete geometric and zeta distributions. Indeed, a combination of both. It allows you to simultaneously match the variance and mean in observed data, thanks to the two parameters p and α. To the contrary, each distribution taken separately only has one parameter, and can not achieve this goal. The zeta-geometric distribution offers more flexibility, especially when dealing with unusual tails in your data. I illustrate the concept when synthesizing real-life tabular data with parametric copulas, for one of the features in the dataset: the number of children per policyholder.

Then, I show how to significantly improve grid search, and make it a viable alternative to gradient methods to estimate the two parameters p and α. The cost function — that is, the error to minimize — is the combined distance between the mean and variance computed on the real data, and the mean and variance of the target zeta-geometric distribution. Thus the mean and variance are used as proxy estimators for p and α. This technique is known as minimum contrast estimation, or moment-based estimation in statistical circles. The “smart” grid search consists of narrowing down on smaller and smaller regions of the parameter space over successive iterations.

The zeta-geometric distribution is just one example of an hybrid distribution. I explain how to design such hybrid models in general, using a very simple technique. They are useful to combine multiple distributions into a single one, leading to model generalizations with an increased number of parameters. The goal is to design distributions that are a good fit when some in-between solutions are needed to better represent the reality.

Download the Article
The technical article, entitled Smart Grid Search for Faster Hyperpameter Tuning, is accessible in the “Free Books and Articles” section as paper #22, here. It contains links to my GitHub files, to easily copy and paste the code. The text highlighted in orange in this PDF document are keywords that will be incorporated in the index, when I aggregate all my related articles into books about machine learning, visualization and Python. The text highlighted in blue corresponds to external clickable links, mostly references. And red is used for internal links, pointing to a section, bibliography entry, equation, and so on.
To not miss future articles, sign-up to our newsletter, here.
About the Author

Vincent Granville is a pioneering GenAI scientist, co-founder at BondingAI.io, the LLM 2.0 platform for hallucination-free, secure, in-house, lightning-fast Enterprise AI at scale with zero weight and no GPU. He is also author (Elsevier, Wiley), publisher, and successful entrepreneur with multi-million-dollar exit. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. He completed a post-doc in computational statistics at University of Cambridge.