Have you tried the xLLM web API? It allows you to fine-tune and debug an agentic multi-LLM in real time. The input data is part of the anonymized corporate corpus of a Fortune 100 company, dealing with AI policies, documentation, integration, best practices, references, onboarding, and so on. It features one sub-LLM. The full corpus is broken down into 15 sub-LLMs. You can try it here on GenAItechLab.com.
One of the goals is to return concise but exhaustive results with enough depth, using acronyms (a specific table for each sub-LLM) to map multi-tokens found in prompts but not in the corpus, with multi-tokens in the corpus. Exhaustivity is the most overlooked metric when evaluating LLMs designed for search / retrieval. Using xLLM in combination with another LLM is one of the best approaches, and both can be used to evaluate each other. Yet, thanks to fast in-memory processing, no weight, and no training, the xLLM web API is one of its kind, with capabilities not found in any competing product, free or not.

The full documentation is in paper 45, available here. I updated the paper, adding section 3 explaining how the web API works. I also added a glossary, index, and a list of top 30 features to dramatically boost LLM performance in general.
Left panel: command menu and prompt box
The left panel allows you to fine-tune the front-end parameters in real time, and to enter your prompt at the bottom: either from pre-selected queries with the option Seeded, or your own prompt with the option Custom. The right panel shows the prompt results.
Initially, the left panel shows no result. After entering any prompt, click on Retrieve Docs to display the results. Before trying any new prompt (except the first one), I recommend clicking on the Reset button at the bottom: it resets the parameters to the default values. The Debugging option sets parameters to extreme values that allow you to retrieve everything xLLM is able to find. But the prompt results on the right side can be voluminous. However, it is useful to understand if missing items in the results are due to a glitch, or due to choosing specific parameter values that eliminate some output. In the next version, a relevancy score will be attached to each returned item in the prompt results. You will be able to display (say) only the top 10 items, based on score. The user will be able to choose the maximum number of items to display in the results. The score (currently hidden) and the results, depend on the parameters.
Finally, parameter values can be modified individually using the top 10 boxes on the left panel, offering custom results and real-time fine-tuning. Lower and upper bounds are specified for each parameter.
Right panel: prompt results
The right panel displays prompt results. Each box represents one item – a text entity – called “card” on the UI and retrieved from the backend tables based on its relevancy to the user prompt. See glossary in the paper, for details.

In our example, two items were retrieved, respectively ‘Business Metadata Template’ and ‘MLTxQuest Governance Badge’. For each item, the green, orange and white fonts represent respectively the title, category, and related tags. If you click on any item, more details show up: see Figure 2. You can expand to retrieve the full raw text: in this case, a JSON entry in the corpus (not shown by default). Also note the text entity ID to match back to the corpus, as well as triggered agents, at the top in Figure 2.

Finally, you can check out embedding entries related to your prompt, by clicking on the Show Embeddings blue box visible in Figure 1. See top embedding entries in Figure 3, for ‘metadata template description’ using the default parameter set. The ‘word’ column shows multitokens extracted from the prompt, while the ‘token’ column represents multitokens from the backend tables, related to the ‘word’ in question. Multitokens flagged with a (*) are contextually related to the ‘word’ in question, instead of just based on immediate proximity. The PMI measures the strength of the association, while the leftmost column is another indicator of relevancy. The associations in question may come from different text entities, or from the knowledge graph itself in version 3. These embedding entries are useful to try additional prompts to refine your search, or for debugging purposes.
As a side note, you can try much longer prompts. I chose a short example here for illustration purposes. Prompts with 20 tokens may generate more voluminous output, in about the same amount of time (no perceptible latency)
Next steps
The following features will be added:
- Incorporation of acronyms, for instance to redirect ‘Doing Business As’ to ‘DBA’ if the former is found in a prompt, but not in the corpus.
- A second dictionary table (or alternate mechanism) for multitokens found in knowledge graph entities: categories, titles, tags, agents, and so on. The end goal is to boost these multitokens, as they have more importance and are of higher quality. In the end, to produce better relevancy scores.
- Working with contextual multitokens, consisting of non-adjacent words found together in a same text sub-entity.
- Data augmentation and more agents, with fewer text entities lacking agents. Breaking prompts into sub-prompts. More NLP: stemming, auto correct, and so on.
The document with full sample xLLM session (fine-tuning), illustrations, input sources, backend tables including embeddings, and full xLLM Python code with link to GitHub, is available as paper 45, here. To not miss future versions with more features, subscribe to my newsletter, here.
About the Author

Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at MLTechniques.com and GenAItechLab.com, former VC-funded executive, author (Elsevier) and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. Follow Vincent on LinkedIn.