Build and Evaluate High Performance Taxonomy-Based LLMs From Scratch
- Vincent Granville
- April 21, 2024
One obvious way to dramatically improve the quality of LLM and RAG systems is to use high-quality input sources, as opposed to just raw text from the crawled or parsed content. Combine it with specialization: one LLM per top domain, allowing the user to customize parameters and specify the domain in addition to standard concise […]
Read MoreHallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings
- Vincent Granville
- April 12, 2024
The new generation of RAG / LLM architecture is moving away from the original monolithic and generic OpenAI model, towards a collection of decentralized and specialized LLMs jointly organized and governed via multi-agent systems. The benefits are obvious: low latency, smaller tables (one per LLM), faster training and fine-tuning, energy-efficient, better results, with much lower […]
Read MoreProbabilistic ANN: The Swiss Army Knife of GenAI
- Vincent Granville
- February 11, 2024
ANN — Approximate Nearest Neighbors — is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, […]
Read MoreGenome: Synthesizing DNA Sequences with LLM Techniques
- Vincent Granville
- December 8, 2023
This methodology is not focused on genome data alone. The purpose is to design a generic solution that may also work in other contexts, such as synthesizing molecules. The problem involves dealing with a large amount of “text”. Indeed, the sequences discussed here consist of letter arrangements, from an alphabet that has 5 symbols: A, […]
Read MoreEasy Trick to Debias GenAI Models: Quantile Convolution
- Vincent Granville
- November 26, 2023
All of the GenAI apps that I tested, including my own, have the same problem. They cannot easily generate data outside the observation range. As an example, let’s focus on the insurance dataset discussed in my new book. I use it to generate synthetic data with GAN (generative adversarial networks) and the NoGAN models discussed […]
Read MoreNoGAN: Ultrafast Data Synthesizer – My Talk at ODSC San Francisco
- Vincent Granville
- November 16, 2023
My talk at the ODSC Conference, San Francisco, October 2023. Includes Notebook demonstration, using our open-source Python libraries. View or download the PowerPoint presentation, here. I discuss NoGAN, an alternative to standard tabular data synthetization. It runs 1000x faster than GAN, consistently delivering better results according to the most sophisticated evaluation metric, implemented here for […]
Read MoreNew Book: Statistical Optimization for Generative AI and Machine Learning
- Vincent Granville
- October 7, 2023
With case studies, Python code, new open source libraries, and applications of the GenAI game-changer technology known as NoGAN (194 pages). This book covers optimization techniques pertaining to machine learning and generative AI, with an emphasis on producing better synthetic data with faster methods, some not even involving neural networks. NoGAN for tabular data is […]
Read MoreNoGAN: New Generation of Synthetic Data (Video)
- Vincent Granville
- September 28, 2023
My talk at the Generative AI Conference, London, September 2023. View or download the PowerPoint presentation, here. I introduce a new, NoGAN alternative to standard tabular data synthetization. It is designed to run faster by several orders of magnitude, compared to training generative adversarial networks (GAN). In addition, the quality of the generated data is […]
Read MoreGenAI: Fast Data Synthetization with Distribution-free Hierarchical Bayesian Models
- Vincent Granville
- September 22, 2023
Deep learning models such as generative adversarial networks (GAN) require a lot of computing power, and are thus expensive. Also, they may not convergence. What if you could produce better data synthetizations, in a fraction of the time, with explainable AI and substantial cost savings? This is what Hierarchical Deep Resampling was designed for. It […]
Read MoreNew Python Library to Evaluate AI-generated Data and Compare Models
- Vincent Granville
- September 19, 2023
Called GenAI-Evalution, you use it for instance to assess the quality of tabular synthetic data. In this case, it measures how faithfully the synthetization mimics the real data it is derived from, by comparing the full joint empirical distributions (ECDF) attached to the two datasets. It works both with categorical and numerical features, and returns […]
Read More
You must be logged in to post a comment.