[On-Device AI Chatbot] Part 5: A Chatbot That Understands Context: Implementing Local RAG and Multi-Context Switching
- TecAce Software
- 44 minutes ago
- 3 min read

A Chatbot That Understands Context
Implementing Local RAG and Multi-Context Switching
In Part 4, we gave our chatbot "eyes, ears, and a mouth" by integrating on-device STT and TTS. However, no matter how well a chatbot listens and speaks, it is only half-useful as a business assistant if it doesn't know your specific "domain knowledge"—like internal company regulations or specific product manuals.
Because Small Language Models (SLMs) are compact, they do not perform as well on factual knowledge benchmarks since their smaller size results in less capacity to retain facts within their parameters. In Part 5, we will explore the detailed implementation of Local RAG (Retrieval-Augmented Generation)—which allows the chatbot to read and answer based solely on documents stored inside the smartphone without an external internet connection—and how we handled Multi-Context Switching across various conversational topics.

1. A Knowledge Vault in Your Phone: Introducing On-Device RAG
In situations where security guidelines strictly prohibit uploading internal company documents to external clouds, building a RAG pipeline inside the device is the safest and most effective way to inject new knowledge into an LLM.
By leveraging the recently released Google AI Edge RAG library, you can augment your small language model with data specific to your application without the need for expensive fine-tuning. Even if you have 1000 pages of information, on-device RAG can quickly help find just the most relevant few pieces of data to feed to your model.
Within the TecAce on-device chatbot app, user-provided text files (.txt) are embedded directly on the device and converted into a database file (.bin) for storage. When a user asks a question, the app retrieves the semantically most similar paragraphs from this local embedding DB and injects them into the on-device LLM as context, allowing it to generate accurate, evidence-based answers even when completely offline.
2. Switching Topics Freely: Multi-Context Switching
In a real-world business environment, a single chatbot must handle numerous document topics, ranging from HR policies to technical specs. To verify this capability, the TecAce team established a multi-context environment using completely different domains as test scenarios, such as "Interior Kit Building", "Galaxy S25 Specifications", and "Tesla Manual".
Users can dynamically switch the context file (embedding DB) of their desired domain via the file browser within the app. The title of the newly selected context is immediately updated and shown on the main UI correctly, ensuring that users always know which knowledge base they are currently conversing with.
The most critical issue to avoid during this process is hallucination caused by remnants of previous conversations. For instance, if a user switches from asking about the Tesla manual to the Galaxy S25 manual, the model must not confuse the two knowledge sets. To prevent this, we strictly controlled the conversational flow by ensuring that switching the context file automatically clears the chat history.
3. From Chatbot to AI Agent: On-Device Function Calling
To evolve from a bot that simply reads documents and answers questions into a true "Agent" that takes instructions to directly control device features or fill out forms, Function Calling is essential.
Google AI Edge's Function Calling library enables on-device language models to intelligently decide when to call predefined functions or APIs within your application. For example, if a user dictates their medical history and personal information into a healthcare app via voice, the app can use the on-device LLM to convert the voice to text, extract the relevant information, and then call application-specific functions to automatically fill out individual form fields.

Next Episode Preview
"So, how can we be absolutely sure that this document-reading on-device chatbot is 'truthfully' speaking only about what is in the document?"
Because LLMs fundamentally generate language based on probabilities, it is incredibly difficult to verify their quality using traditional rule-based software testing methods.
In the upcoming [Part 6] How to Verify AI Quality? (Introduction to SuperVision), we will take a detailed look at 'AI SuperVision', an automated verification tool adopted by TecAce to catch the chronic generative AI issue of 'Hallucination' and objectively evaluate the chatbot's Factuality and Faithfulness.
Comments