Custom Enterprise LLM/RAG with Real-Time Fine-Tuning

This article features an application of xLLM to extract information from a corporate corpus, using prompts referred to as “queries”. The goal is to serve the business user — typically an employee of the company or someone allowed access — with condensed, relevant pieces of information including links, examples, PDFs, tables, charts, definitions and so on, to professional queries. The original xLLM technology is described in this presentation. More details are available in my new book (June 2024), available here.

Front-end diagram (zoom in for higher resolution)

Back-end diagram (zoom in for higher resolution)

The main differences with standard LLMs are:

No training, no neural network involved. Thus, very fast and easy to fine-tune with explainable parameters, and much fewer tokens. Yet, most tokens consist of multiple terms and are called multitokens. Also, I use variable-length embeddings. Cosine similarity and dot products are replaced by customized pmi (pointwise mutual information).

Parameters have a different meaning in my context. In standard architectures, they represent the weights connecting neurons. You have billions or even trillions of them. But there is no neural network involved here: instead, I use parametric weights governed by a few top-level parameters. The weights — explicitly specified rather than iteratively computed — are not the parameters. My architecture uses two parameter sets: frontend and backend. The former are for scoring and relevancy; they are fine-tuned in real time with no latency, by the user or with some algorithm. A relevancy score is shown to the user, for each retrieved item.

I don’t use vector or graph databases. Tables are stored as nested hashes, and fit in memory (no GPU needed). By nested hashes, I mean key-value tables, where the value may also be a key-value table. The format is similar to JSON objects. In standard architectures, the central table stores the embeddings. Here, embeddings are one of many backend tables. In addition, there are many contextual tables (taxonomy, knowledge graph, URLs) built during the crawling. This is possible because input sources are well structured, and elements of structure are recovered thanks to smart crawling.

The Python code does not use any library, nor any API call. Not even Pandas, Numpy, or NLTK. So you can run it in any environment without concern for library versioning. Yet it has fewer than 600 lines of code, including the fine-tuning part in real time. I plan to leverage some library functions in the future such as auto-correct, singularize, stem, stopwords and so on. However, home-made solutions offer more customization, such as ad-hoc stopwords lists specific to each sub-LLM, for increased performance. For instance, the one-letter word ‘p’ cannot be eliminated if the sub-LLM deals with statistical concepts. The only exception to the “no library” rule is the Requests library, if you choose to download the test enterprise corpus from its GitHub location.

This article focuses only on one part of an enterprise corpus: the internal documentation about how to implement or integrate AI and machine learning solutions. Other parts include marketing, IT, product, sales, legal and HR. A specific sub-LLM is built for each part, using the same architecture. The full LLM consists of these sub-LLMs, glued together with an LLM router to redirect user prompts to the specific parts, possibly spanning across multiple sub-LLMs. For instance, “security” is found in multiple sub-LLMs.

Conclusions

My custom sub-LLM designed from scratch does not rely on any Python library or API, and performs better than search tools available on the market, in terms of speed and results relevancy. It offers the user the ability to fine-tune parameters in real time, and can detect user intent to deliver appropriate output. The good performance comes from the quality of the well-structured input sources, combined with smart crawling to retrieve the embedded knowledge graph and integrate it into the backend tables. Traditional tools rely mostly on tokens, embeddings, billions of parameters and frontend tricks such as prompt engineering to fix backend issues.

To the contrary, my approach focuses on building a solid backend foundational architecture from the ground up. Tokens and embeddings are not the most important components, by a long shot. Cosine similarity and dot products are replaced by pointwise mutual information. There is no neural network, no training, and a small number of explainable parameters, easy to fine-tune.

When you think about it, the average human being has a vocabulary of 30,000 words. Even if you added variations and other pieces of information (typos, plural, grammatical tenses, product IDs, street names, and so on), you end up with a few millions at most, not trillions. Indeed, in expensive multi-billion systems, most tokens and weights are just noise: most are rarely fetched to serve an answer. This noise is a source of hallucinations.

Top multitokens in corpus, sorted by importance

Finally, gather a large number of user queries even before your start designing your architecture, and add prompt elements into your backend tables, as a source of data augmentation. It contributes to enhancing the quality of your system.

View article, get the code and data

The technical document is available on GitHub, here. It features detailed documentation with illustrations (5 pages) and the code (10 pages), with links to data sources, backend tables, as well as the code on GitHub. To get the clickable links to work, download the document and view it in any browser or PDF viewer, instead of directly on GitHub.

To not miss future versions with more features, subscribe to my newsletter, here.

About the Author

Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at MLTechniques.com and GenAItechLab.com, former VC-funded executive, author (Elsevier) and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. Follow Vincent on LinkedIn.

	messerb5467 on Quantum Derivatives, GenAI, an…
Vincent Granville – Author, publisher, machine learning scientist. Founder of MLtechniques.com. Co-founder of Data Science Central, acquired by Tech Target.	Vincent Granville on Quantum Derivatives, GenAI, an…
	Brad Messer on Quantum Derivatives, GenAI, an…
	Sanjay Gautam on Number Theory: Longest Runs of…
Artem Melnyk – Ukraine – Hello there! My name is Artem. I am an AI enthusiast and affiliate marketer. As an AI enthusiast, I'm always on the lookout for new tools, techniques, and ideas that can help businesses and individuals utilize AI to stimulate innovation and growth. As an affiliate marketer, I'm passionate about helping people discover the best AI products and services available. Whether it's an advanced AI platform or powerful machine learning tool, my insights and recommendations are always eager to be shared with others. Are you passionate about AI content? Look no further! I enjoy liking, following and commenting on blogs related to AI, as well as finding new opportunities to collaborate with fellow AI enthusiasts and marketers. If you're interested in learning more about my affiliate marketing endeavors, feel free to check out https://zeep.ly/SmdwN. I'm sure that you'll find some fantastic AI products and services that can help take your business or personal projects to the next level. Thanks for stopping by; I look forward to connecting with you soon!	Artem Melnyk on Autonomous Driving: Boosting O…

Custom Enterprise LLM/RAG with Real-Time Fine-Tuning

Conclusions

View article, get the code and data

About the Author

Share this:

Leave a ReplyCancel reply

Discover more from NextGen AI Technology