r/LocalLLaMA · June 9, 2026 · 1 min read

Jetson Orin NX Build for Hermes Agent + Benchmarking

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Jetson Orin NX Build for Hermes Agent + Benchmarking

I had a huge LLM server, and now I have a tiny one! I had a Jetson Orin NX gathering dust from a long dead robotics project, from back in the Llama-7B days. I figured now with MoE and smaller models doing well, it was time to mess with it again.

Goal:

As silent as possible (given they bumped the power from 25W -> 40W)
Greater than 10 tok/s TG and 300 tok/s PP
at least 65K context for Hermes Agent
Must look cool AF 👌🏻

With those constraints, I had to take a hacksaw to the stock heatsink and make a new case. Then I tested way too many models (the expected, Gemma-4's and Qwen 3.6's), but with too many quant variations.

It's all written up in the blog!

TL;DR: Gemma 4 26B A4B UD Q2_K_XL gives:

66K context window
14.65 tok/s at ~8k context
10.21 tok/s at ~60k context
Still does an OK job with multiple tool calls with long prompts

Hope this comes in handy!

submitted by /u/Reddactor
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA