Entitled “Advanced Machine Learning with Basic Excel”, the full version in PDF format is accessible in the “Free Books and Articles” section as paper #11, here. Also discussed in detail with Python code in chapter 2 in my book “Intuitive Machine Learning and Explainable AI”, available here.
I discuss ensemble methods combining many mini decision trees, blended with regression, explained in simple English with both Excel and Python implementations. Case study: natural language processing (NLP) problem. Ideal reading for professionals who want to start light with Machine Learning (say with Excel) and get very fast to much more advanced material and Python. The Python code is not just a call to some Blackbox functions, but a full-fledge detailed procedure on its own. This algorithm is in the same category as boosting, bagging, stacking and AdaBoost.

Abstract
The method described here illustrates the concept of ensemble methods, applied to a real-life NLP problem: ranking articles published on a website to predict performance of future blog posts yet to be written, and help decide on title and other features to maximize traffic volume and quality, and thus revenue. The method, called hidden decision trees (HDT), implicitly builds a large number of small usable (possibly overlapping) decision trees. Observations that don’t fit in any usable node are classified with an alternate method, typically simplified logistic regression.
This hybrid procedure offers the best of both worlds: decision tree combos and regression models. It is intuitive and simple to implement. The code is written in Python, and I also offer a light version in basic Excel. The interactive Excel version is targeted to analysts interested in learning Python or machine learning. HDT fits in the same category as bagging, boosting, stacking and adaBoost. This article encourages you to understand all the details, upgrade the technique if needed, and play with the full code or spreadsheet as if you wrote it yourself. This is in contrast with using Blackbox Python functions without understanding their inner workings and limitations. Finally, I discuss how to build model-free confidence intervals for the predicted values.
Table of Contents
- Methodology
. . . How hidden decision trees (HDT) work
. . . NLP Case study: summary and findings
. . . Parameters
. . . Improving the methodology - Implementation details
. . . Correcting for bias
. . . . . . Time-adjusted scores
. . . Excel spreadsheet
. . . Python code and dataset - Model-free confidence intervals and perfect nodes
. . . Interesting asymptotic properties of confidence intervals
Download the Article
The technical article is accessible in the “Free Books and Articles” section as paper #11, here. The text highlighted in orange in this PDF document are keywords that will be incorporated in the index, when I aggregate all my related articles into a single book about innovative machine learning techniques. The text highlighted in blue corresponds to external clickable links, mostly references. And red is used for internal links, pointing to a section, bibliography entry, equation, and so on.
To not miss future articles, sign-up to our newsletter, here.
About the Author

Vincent Granville is a pioneering GenAI scientist, co-founder at BondingAI.io, the LLM 2.0 platform for hallucination-free, secure, in-house, lightning-fast Enterprise AI at scale with zero weight and no GPU. He is also author (Elsevier, Wiley), publisher, and successful entrepreneur with multi-million-dollar exit. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. He completed a post-doc in computational statistics at University of Cambridge.