# Advanced Machine Learning with Basic Excel: Simple Alternative to XGBoost

Entitled “Advanced Machine Learning with Basic Excel”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. Also discussed in details with Python code in chapter 2 in my book “Intuitive Machine Learning and Explainable AI”, available here.

I discuss ensemble methods combining many mini decision trees, blended with regression, explained in simple English with both Excel and Python implementations. Case study: natural language processing (NLP) problem. Ideal reading for professionals who want to start light with Machine Learning (say with Excel) and get very fast to much more advanced material and Python. The Python code is not just a call to some blackbox functions, but a full-fledge detailed procedure on its own. This algorithm is in the same category as boosting, bagging, stacking and AdaBoost.

## Abstract

The method described here illustrates the concept of ensemble methods, applied to a real life NLP problem: ranking articles published on a website to predict performance of future blog posts yet to be written, and help decide on title and other features to maximize traffic volume and quality, and thus revenue. The method, called hidden decision trees (HDT), implicitly builds a large number of small usable (possibly overlapping) decision trees. Observations that don’t fit in any usable node are classified with an alternate method, typically simplified logistic regression.

This hybrid procedure offers the best of both worlds: decision tree combos and regression models. It is intuitive and simple to implement. The code is written in Python, and I also offer a light version in basic Excel. The interactive Excel version is targeted to analysts interested in learning Python or machine learning. HDT fits in the same category as bagging, boosting, stacking and adaBoost. This article encourages you to understand all the details, upgrade the technique if needed, and play with the full code or spreadsheet as if you wrote it yourself. This is in contrast with using blackbox Python functions without understanding their inner workings and limitations. Finally, I discuss how to build model-free confidence intervals for the predicted values.

1. Methodology
. . . How hidden decision trees (HDT) work
. . . NLP Case study: summary and findings
. . . Parameters
. . . Improving the methodology
2. Implementation details
. . . Correcting for bias
. . . . . . Time-adjusted scores
. . . Python code and dataset
3. Model-free confidence intervals and perfect nodes
. . . Interesting asymptotic properties of confidence intervals