Some llama.cpp B70 SYCL benchmarks
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
build: dd4623a74 (9640)
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma4 12B Q8_0 | 11.78 GiB | 11.91 B | SYCL | -1 | pp512 | 1578.19 ± 7.82 |
| gemma4 12B Q8_0 | 11.78 GiB | 11.91 B | SYCL | -1 | tg128 | 32.43 ± 0.07 |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma4 26B.A4B Q8_0 | 25.00 GiB | 25.23 B | SYCL | -1 | pp512 | 1332.35 ± 8.80 |
| gemma4 26B.A4B Q8_0 | 25.00 GiB | 25.23 B | SYCL | -1 | tg128 | 40.13 ± 0.09 |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma4 E2B Q8_0 | 4.69 GiB | 4.65 B | SYCL | -1 | pp512 | 5662.45 ± 23.05 |
| gemma4 E2B Q8_0 | 4.69 GiB | 4.65 B | SYCL | -1 | tg128 | 109.14 ± 0.26 |
| model | size | params | backend | ngl | ot | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------------- | --------------: | -------------------: |
| qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | SYCL | 99 | blk\.(3[4-9])\.ffn_(gate|up|down)_exps=CPU | pp512 | 563.48 ± 14.58 |
| qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | SYCL | 99 | blk\.(3[4-9])\.ffn_(gate|up|down)_exps=CPU | tg128 | 44.67 ± 0.04 |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q8_0 | 27.04 GiB | 27.32 B | SYCL | -1 | pp512 | 778.20 ± 0.99 |
| qwen35 27B Q8_0 | 27.04 GiB | 27.32 B | SYCL | -1 | tg128 | 15.42 ± 0.01 |
Just fyi. It runs Ok, but it could be better.
[link] [comments]
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.