r/LocalLLaMA · May 29, 2026 · 1 min read

StepFun 3.7 Flash - Speed Benchmark in M5 Max

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Just ran a benchmark with day-0 shipped llama.cpp's branch.

M5 Max: 128 GB - Q4_K_S / memory peak around ~120+ GB making things sluggish but still usable once cmd+tab landed.

Short context < 16k feels fast and very responsive. 32k-64k's speed is not bad, usable.

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
0	128	1	128	0.000	nan	2.038	62.80	2.038	62.80
2048	128	1	2176	1.938	1056.65	2.115	60.52	4.053	536.88
8192	128	1	8320	9.153	895.01	2.233	57.32	11.386	730.71
16384	128	1	16512	22.428	730.52	2.475	51.71	24.903	663.05
32768	128	1	32896	64.539	507.73	2.818	45.43	67.356	488.39
65536	128	1	65664	178.227	367.71	3.774	33.92	182.001	360.79

Now Pelican bench - very nice one but with quite a long hand lol

Discussion (0)

No comments yet. Sign in and be the first to say something.