Here I explain how we manage to avoid hallucinations with our home-made Enterprise RAG/LLM. The most recent article on the topic is available here. We do it with no training and zero parameter. By zero parameter, I mean no neural network parameters — the typical 40B you see in many LLMs, that stands for 40 billion parameters also called weights. We do indeed have a few intuitive parameters that you can fine-tune in real time.
Tips to make your system hallucination-free
- We use sub-LLMs specific to each topic (part of a large corpus), thus mixing unrelated items is much less likely to happen.
- In the base version, the output returned is unaltered rather than reworded. The latter can cause hallucinations.
- It shows a high-level structured summary first, with category, tags, agents attached to each item; the user can click on the items he is most interested in based on summary, reducing the risk of misfit.
- The user can specify agents, tags or categories in the UI, it’s much more than a prompt box. He can also include negative keywords, joint keywords that must appear jointly in the corpus, put a higher weight on the first keyword in the prompt, or favor the most recent material in the results.
- Python libraries can cause hallucinations. For instance, project and projected have the same stem. We use these libraries but with workarounds to avoid these issues that can lead to hallucinations.
- We return a relevancy score to each item in the prompt results, ranging from 0 to 10. If we cannot find highly relevant information in your augmented corpus, despite using a synonyms dictionary, the score will be low, telling you that the system knows that this particular item is not great. You can choose to no show items with a low score, though sometimes they contain unexpectedly interesting information (the reason to keep them).
- We show links and references, all coming from reliable sources. The user can double-check in case of doubt.
- We suggest alternate keywords to use in your next prompt (related concepts), but let the user decide on which ones to choose.
- When working with content generated by many users (like Stack Overflow), detect the most trustworthy users and ignore or penalize material posted by users with low score. Use multiple sources rather than a single user, to come up with an answer.
- Look at ratings given by users to popular prompt results. Negative feedback means that either your LLM return useless answers to specific prompts, because it does not cover your entire corpus (spread across multiple silos), or it shows outdated material, or what the user is looking for is not in your corpus (hint: update your corpus and then re-train your LLM).
More here. The featured image is the table of content for the paper in question. For references regarding our game-changing GenAI technology, read this article, and check out our research books and articles, here. We post regular updates in our free newsletter. You can sign-up here or using the subscription form below.
About the Author

Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at MLTechniques.com and GenAItechLab.com, former VC-funded executive, author (Elsevier) and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. Follow Vincent on LinkedIn.