top of page
LATEST TECH ARTICLES


The Journey to Automatically Measure LLM Performance on Smartphones – Building the On-Device LLM Tester
"You want to put AI on a smartphone?" — The Beginning of a Reckless Challenge The story of how TecAce's AI Supervision Team built the On-Device LLM Tester "Um… I'd like to automatically measure LLM performance on a smartphone." A brief silence fell over the meeting room. Our team was developing our own on-device AI chatbot. The problem was that every time we swapped models, a tester had to physically hold the phone, send prompts one by one, and time everything by hand. That p


AI That Stays Alive Even Offline: From the Field to the Store to the Campsite
AI must work even where there is no internet. TecAce On-device combines OTA updates with a hybrid offline/online architecture to deliver up-to-date knowledge anywhere — from factory floors out of signal range to campsites deep in the mountains. In this article, we explore five industry use cases where the TecAce platform can be applied, each illustrated with a concrete scenario. Case 1. An AI Manual Companion for Field Workers Field AI Companion — Factories · Shipyards · Plan


Exploring On-Device Large Language Models in Efficient AI Language Tools
Artificial intelligence is evolving fast. One of the most exciting developments is the rise of efficient AI language tools that operate directly on devices. This shift changes how businesses handle data, privacy, and speed. Instead of relying solely on cloud servers, AI can now run locally on smartphones, laptops, or edge devices. This blog dives into the world of on-device large language models, explaining what they are, why they matter, and how they can transform enterprise


Galaxy S25 vs S26: On-Device AI Performance Benchmark
Galaxy S25 vs S26: On-Device AI Performance Benchmark Reversal! (Snapdragon 8 Elite Gen 1 vs Gen 2) Does a newer chipset always guarantee faster AI performance? Based on real-world test data conducted by TecAce, we compared the on-device LLM performance between the Galaxy S25 and the Galaxy S26 to find out. Test Overview Devices Compared: Galaxy S25 (Snapdragon 8 Elite) vs. Galaxy S26 (Snapdragon 8 Elite Gen 2) Test Models: Gemma3 1B (INT4): An ultra-lightweight conversationa


The Future of On-Device AI and TecAce's Roadmap (Conclusion) 10/10
The Future of On-Device AI and TecAce's Roadmap Throughout this 9-part series, we have chronicled the entire journey of developing an on-device chatbot—a solution to cloud cost and data security issues. We covered everything from selecting a Small Language Model (SLM) and applying quantization, integrating offline STT/TTS, building local RAG, to rigorously validating quality using AI SuperVision and overcoming hardware performance constraints. In this grand finale, Part 10,


Challenging Performance Limits: Heat, Battery, and Response Speed 9/10
Challenging Performance Limits Heat, Battery, and Response Speed In Part 8, we shared how we caught hallucinations and improved response quality using 'AI SuperVision'. While making the model smarter and more accurate is a huge milestone, running it in a real-world smartphone environment (like the Galaxy S25 FE) forces us to confront harsh physical walls: Thermal management, Battery consumption, and Latency limits. Unlike the limitless resources of cloud data centers, a mob


Catching Hallucinations: Analyzing SuperVision Test Results 8/10
Catching Hallucinations Analyzing SuperVision Test Results In Part 7, we built an automated testing pipeline that bridged our on-device chatbot app inside a smartphone with the AI SuperVision server on a PC. This enabled an end-to-end flow from prompt injection and answer extraction to automated grading. We finally had an environment capable of running dozens of test cases automatically. So, what kind of report card did our on-device SLM (Gemma-2B based) receive from these


Building SuperVision: An Automated Chatbot Testing Pipeline 7/10
Building SuperVision An Automated Chatbot Testing Pipeline In Part 6, we explained the background of introducing Testworks' 'AI SuperVision' tool to objectively evaluate the chronic hallucination issues inherent in generative AI. However, to actually apply this tool to our project, we had to overcome a significant technical barrier. Our LLM chatbot operates completely offline "On-device" (inside a smartphone), whereas the AI SuperVision system evaluating it exists in a "PC


How to Verify AI Quality? (Introduction to SuperVision) 6/10
How to Verify AI Quality? Introduction to SuperVision In Part 5, we explored how to inject our company's proprietary knowledge into the on-device chatbot using Local RAG (Retrieval-Augmented Generation) and Multi-Context Switching. However, equipping the chatbot with knowledge does not immediately solve all problems. "How can we be absolutely sure that this chatbot isn't fabricating answers and is truthfully speaking only about what is in the provided documents?" In Part 6,


The Ears and Mouth of a Chatbot: On-Device STT/TTS Integration 4/10
The Ears and Mouth of a Chatbot On-Device STT/TTS Integration In Part 3, we explored the optimization process of compressing a massive language model to fit the constrained resources of a smartphone and boosting inference speed using the mobile NPU. Now that we have successfully embedded a fast and smart "brain" inside the device, it is time to give our chatbot the "ears and mouth" it needs to interact naturally with users. In a mobile environment, typing out long texts eve


A Chatbot That Understands Context: Implementing Local RAG and Multi-Context Switching 5/10
A Chatbot That Understands Context Implementing Local RAG and Multi-Context Switching In Part 4, we gave our chatbot "eyes, ears, and a mouth" by integrating on-device STT and TTS. However, no matter how well a chatbot listens and speaks, it is only half-useful as a business assistant if it doesn't know your specific "domain knowledge"—like internal company regulations or specific product manuals. Because Small Language Models (SLMs) are compact, they do not perform as well


Core Technologies of Mobile AI: Quantization and NPU Optimization 3/10
Core Technologies of Mobile AI Quantization and NPU Optimization In Part 2, we discussed our selection of Gemma-2B as the ideal Small Language Model (SLM) for our project and shared our experiences benchmarking CPU and GPU performance in a constrained smartphone environment. However, the initial tests revealed significant challenges: noticeable latency delays and out-of-memory errors. To run LLMs in real-time on a mobile device held in the palm of your hand—not on a data ce


Giant Language Models in the Palm of Your Hand: Mobile SLM Selection Strategy 2/10
Giant Language Models in the Palm of Your Hand Mobile SLM Selection Strategy In Part 1, we explored how "On-Device AI" is becoming an essential paradigm for solving cloud cost and data security issues. But how can we fit massive Large Language Models (LLMs) with tens or hundreds of billions of parameters—which typically run on massive GPU racks in data centers—into a small smartphone? The answer lies in Small Language Models (SLMs). In Part 2, we will compare the most notab


Why "On-Device AI" Now? (Overview) 1/10
Why "On-Device AI" Now? Over the past few years, generative AI, led by services like ChatGPT, has revolutionized our daily lives and workflows. However, behind these powerful AI services lies a common limitation: cloud dependency. The standard architecture—where user queries are sent to cloud servers and the computed results from massive data centers are sent back—inevitably introduces risks such as data privacy breaches, network latency, and exorbitant server maintenance co
SECURE YOUR BUSINESS TODAY
bottom of page