Galaxy S25 vs S26: On-Device AI Performance Benchmark
- TecAce Software
- 1 day ago
- 2 min read
Galaxy S25 vs S26: On-Device AI Performance Benchmark Reversal!
(Snapdragon 8 Elite Gen 1 vs Gen 2)
Does a newer chipset always guarantee faster AI performance? Based on real-world test data conducted by TecAce, we compared the on-device LLM performance between the Galaxy S25 and the Galaxy S26 to find out.
Test Overview
Devices Compared: Galaxy S25 (Snapdragon 8 Elite) vs. Galaxy S26 (Snapdragon 8 Elite Gen 2)
Test Models:
Gemma3 1B (INT4): An ultra-lightweight conversational AI where response speed is the key metric.
Qwen2.5 1.5B (Q8): A model designed to handle complex reasoning tasks with higher precision.
Test Scope: A total of 108 tests (2 devices × 2 models × 27 prompts × 11 categories).
Core Performance Metrics at a Glance
Contrary to the expectation that the S26 would dominate across the board, the two generations showed distinctly different strengths.
Metric (Gemma3 1B) | Galaxy S25 | Galaxy S26 | Result |
Average Latency | 5.4s | 7.4s | S25 Wins (-37.4%) |
Time to First Token (TTFT) | 280ms | 238ms | S26 Wins (+15.0%) |
Decode TPS (Text Generation) | 66.5 tok/s | 49.6 tok/s | S25 Wins (-25.4%) |
Prefill TPS (Prompt Understanding) | 83.0 tok/s | 97.3 tok/s | S26 Wins (+17.2%) |
💡 Core Insight: The S26 is exceptionally fast at understanding the prompt (Prefill), but the S25 is actually faster at generating the resulting text (Decode).
Key Findings & Practical Recommendations
The S26's True Strength is 'Input Processing'
The generational leap of the S26 is most evident when running the Qwen2.5 model.
Prompt Understanding Speed (Prefill TPS): Improved by a massive 64.2% compared to the S25.
Time to First Token (TTFT): Shortened by 44.0% (from 670ms down to 375ms).
The Takeaway: The S26 quickly comprehends prompts, making it feel much more responsive initially, even though the S25 finishes generating the full response faster.
Resource Usage is Mostly Similar
Peak Memory: Both devices secure enough memory for on-device LLM operations, peaking around the 400MB mark (differences are negligible).
Initialization Time: The S25 initializes the models about 4~5% faster, giving it a slight edge in app cold-start experiences.
Recommended Combinations by Use Case
Interactive Chatbots (Speed Priority): The Galaxy S25 + Gemma3 1B (INT4) combination provides the lowest latency and is the most advantageous for overall response speed.
Offline High-Quality Reasoning (Accuracy Priority): The Galaxy S26 + Qwen2.5 1.5B (Q8) combination excels in TTFT and Prefill speeds, making it ideal for processing pipelines that require long context inputs.
Final Conclusion
"The Galaxy S26 is not simply a 'faster S25'." It is a device with entirely different characteristics: it understands prompts faster, but generates text slower. Selecting the right device and model combination tailored to your specific usage scenario is the key to maximizing on-device AI performance.
For more details, please check out the full report.


![[On-Device AI Chatbot] Part 10: The Future of On-Device AI and TecAce's Roadmap (Conclusion)](https://static.wixstatic.com/media/2ea07e_d1771a9889764093a8c855756693ba51~mv2.png/v1/fill/w_980,h_535,al_c,q_90,usm_0.66_1.00_0.01,enc_avif,quality_auto/2ea07e_d1771a9889764093a8c855756693ba51~mv2.png)
![[On-Device AI Chatbot] Part 9: Challenging Performance Limits: Heat, Battery, and Response Speed](https://static.wixstatic.com/media/2ea07e_826bc45db874477090ea018335b34059~mv2.png/v1/fill/w_980,h_535,al_c,q_90,usm_0.66_1.00_0.01,enc_avif,quality_auto/2ea07e_826bc45db874477090ea018335b34059~mv2.png)
![[On-Device AI Chatbot] Part 8: Catching Hallucinations: Analyzing SuperVision Test Results](https://static.wixstatic.com/media/2ea07e_69fba1e933354148a97a50bbfb2f2dcb~mv2.png/v1/fill/w_980,h_535,al_c,q_90,usm_0.66_1.00_0.01,enc_avif,quality_auto/2ea07e_69fba1e933354148a97a50bbfb2f2dcb~mv2.png)
Comments