top of page

Galaxy S25 vs S26: On-Device AI Performance Benchmark


Galaxy S25 vs S26: On-Device AI Performance Benchmark Reversal!

(Snapdragon 8 Elite Gen 1 vs Gen 2)


Does a newer chipset always guarantee faster AI performance? Based on real-world test data conducted by TecAce, we compared the on-device LLM performance between the Galaxy S25 and the Galaxy S26 to find out.


Test Overview

  • Devices Compared: Galaxy S25 (Snapdragon 8 Elite) vs. Galaxy S26 (Snapdragon 8 Elite Gen 2)


  • Test Models:

    • Gemma3 1B (INT4): An ultra-lightweight conversational AI where response speed is the key metric.

    • Qwen2.5 1.5B (Q8): A model designed to handle complex reasoning tasks with higher precision.

  • Test Scope: A total of 108 tests (2 devices × 2 models × 27 prompts × 11 categories).


Core Performance Metrics at a Glance

Contrary to the expectation that the S26 would dominate across the board, the two generations showed distinctly different strengths.

Metric (Gemma3 1B)

Galaxy S25

Galaxy S26

Result

Average Latency

5.4s 


7.4s


S25 Wins (-37.4%) 

Time to First Token (TTFT)

280ms

238ms 

S26 Wins (+15.0%) 

Decode TPS (Text Generation)

66.5 tok/s 

49.6 tok/s

S25 Wins (-25.4%) 

Prefill TPS (Prompt Understanding)

83.0 tok/s


97.3 tok/s 


S26 Wins (+17.2%) 

💡 Core Insight: The S26 is exceptionally fast at understanding the prompt (Prefill), but the S25 is actually faster at generating the resulting text (Decode).

Key Findings & Practical Recommendations

  1. The S26's True Strength is 'Input Processing'


The generational leap of the S26 is most evident when running the Qwen2.5 model.


  • Prompt Understanding Speed (Prefill TPS): Improved by a massive 64.2% compared to the S25.


  • Time to First Token (TTFT): Shortened by 44.0% (from 670ms down to 375ms).


  • The Takeaway: The S26 quickly comprehends prompts, making it feel much more responsive initially, even though the S25 finishes generating the full response faster.


  1. Resource Usage is Mostly Similar


  • Peak Memory: Both devices secure enough memory for on-device LLM operations, peaking around the 400MB mark (differences are negligible).

  • Initialization Time: The S25 initializes the models about 4~5% faster, giving it a slight edge in app cold-start experiences.


  1. Recommended Combinations by Use Case


  • Interactive Chatbots (Speed Priority): The Galaxy S25 + Gemma3 1B (INT4) combination provides the lowest latency and is the most advantageous for overall response speed.

  • Offline High-Quality Reasoning (Accuracy Priority): The Galaxy S26 + Qwen2.5 1.5B (Q8) combination excels in TTFT and Prefill speeds, making it ideal for processing pipelines that require long context inputs.


Final Conclusion

"The Galaxy S26 is not simply a 'faster S25'." It is a device with entirely different characteristics: it understands prompts faster, but generates text slower. Selecting the right device and model combination tailored to your specific usage scenario is the key to maximizing on-device AI performance.


For more details, please check out the full report.



Comments


bottom of page