Here I share my roadmap for the next 12 months. While I am also looking for external contributors and authors to add more variety, my focus — as far as my technical content is concerned — is to complete the following projects and publish the material on this platform. All my blog posts will be available to everyone. Some technical papers (in PDF format) may be offered to subscribers only (you can subscribe here). My plan is to also produce books focusing on specific topics, covering material from several articles in a self-contained unified package. They will be available on our e-Store.
I am working full time on this project. Unlike someone working for an organization or even a consultant, there is no restrictions on the intellectual property that I can share. This initiative is entirely self-funded, which guarantees neutrality. The data sets at my disposal, to test my methods, are freely available and huge. Most of my articles are offered with portions of my master data set, allowing you to fully replicate the results.
Various themes will be covered, and discussed in the next section.
Synthetic data. I have over 20 years of experience generating and working with simulated data, emulating a large class of real data, spanning from spatial processes, clustering, shapes, to multivariate financial processes and auto-correlated time series. Synthetic data offers a lot of possibilities to test or benchmark algorithms, and train machine learning systems.
Shape catalog. I have been working on image analysis since the late eighties. My plan is to offer a large catalog of categorized synthetic shapes to help you create your own training set, to use in computer vision, image and sound recognition problems.
Regression techniques, decision trees. The purpose is to offer simple, robust alternatives to traditional models, easy to implement and control, even in Excel. It will include a generic regression technique based on the fixed-point algorithm (no need to know matrix algebra), fuzzy regression, and a blending of regression with a large number of small decision trees, with predictions based on a majority vote among competing techniques. The fuzzy regression offers multiple regression lines as output, rather than just one. Depending on the observation, a probability is assigned to each regression line, making the prediction “fuzzy” but more flexible. This is not restricted to linear regression; I will also discuss a simplified logistic regression.
Clustering and classification. I have been working on these problems for decades. My upcoming articles will feature the most recent developments: clustering in GPU (graphics processing unit) using image filtering techniques and equalizers, as well as fuzzy classification. Some of this content is already featured in my recent book, available here. GPU classification is illustrated here. Note that this is applied to standard, tabular data, not images. The data is mapped onto an image to allow easy processing, but the data itself does not consist of images: this is the originality of the technique.
Data animations, sound, “no code” machine learning. This section encompasses visualization and goes one step further, with the production of videos and animated Gifs. A lot can be done with a few clicks, using Excel or with some simple calls to video libraries in R or Python, using minimalist code. Also, the plan is to add data-induced sound (matching the summarized data) as extra dimensions (sound frequency, amplitude, duration, texture) to the video. Finally, an article will discuss the generation of optimum palettes either for classification purposes, or for images with a large number of colors.
Explainable AI and very deep neural networks. The goal is to design automated black-box systems that are interpretable. An example is my shape classifier, not relying on neural networks. It is available here. To the contrary, my new classifier uses 250 layers in a very deep learning neural network. Yet, it is a very sparse network, with one connection by neuron, producing an unusually granular classification. Because it is based on image filtering techniques (even though the data has nothing to do with image processing), it is easy to fine tune and interpret. See it in action, here. The goal is to publish more articles related to this topic, and eventually, a book.
Probability distributions. Over the last 20 years, I have been working and creating hundreds of probability distributions serving many purposes, such as generalized logistic, Poisson-exponential, Riemann zeta distributions, and distributions that are nowhere differentiable or defined on unusual domains (sphere, simplex). The goal is to create a catalog of the most useful ones, illustrated with applications.
Excel for machine learning. I have used Excel in many machine learning problems, sometimes in combination with Perl or Python programming, and sometimes as a stand-alone tool to solve a problem. I want put all these spreadsheets in a unified document. Some are currently available on my GitHub repository (see here and here). The plan is to add many more, and bundle them in an easy-to-read document.
Experimental math. Topics include discrete dynamical systems (including stochastic systems), unusually clustered Brownian motions, use of machine learning techniques to attack difficult math problems or discover patterns, use of Bignum libraries, benchmarking machine learning techniques on predictable math data, designing synthetic data sets, and more. Including original contributions on the Riemann hypothesis and the twin prime conjecture.
Innovative machine learning. This will be the title of an upcoming book, focusing on simpler and more intuitive ways to analyze data. It will cover the following topics: cross-validation, model-free confidence regions, resampling, assessing the impact of individual or pairs of features on predictions, minimum contrast estimation (a generic estimation technique), optimization with a divergent fixed-point algorithm, covering problem based on population density rather than area, true test of independence to detect subtle departures from full independence, time series with long-range autocorrelations, NLP and taxonomy creation, data science with the naked eye, modern regression, and more. Some of this material will first appear as articles posted on MLTechniques.com. Some can be found in my previous book, here.
Off the beaten path exercises. My numerous articles and books, including future ones, are peppered with original exercises that require out-of-the-box thinking, and solve interesting problems. If you are a university professor scrambling to find fresh material, you will be interested in my upcoming book featuring the most interesting part of this collection. Of course, this book is also targeted to students.
Vincent Granville, Ph.D.
Author and Publisher,
MLTechniques.com | MLTblog.com
You must log in to post a comment.