r/LocalLLaMA · · 1 min read

Jetson Orin NX Build for Hermes Agent + Benchmarking

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Jetson Orin NX Build for Hermes Agent + Benchmarking

I had a huge LLM server, and now I have a tiny one! I had a Jetson Orin NX gathering dust from a long dead robotics project, from back in the Llama-7B days. I figured now with MoE and smaller models doing well, it was time to mess with it again.

Goal:

  • As silent as possible (given they bumped the power from 25W -> 40W)
  • Greater than 10 tok/s TG and 300 tok/s PP
  • at least 65K context for Hermes Agent
  • Must look cool AF 👌🏻

With those constraints, I had to take a hacksaw to the stock heatsink and make a new case. Then I tested way too many models (the expected, Gemma-4's and Qwen 3.6's), but with too many quant variations.

It's all written up in the blog!

TL;DR: Gemma 4 26B A4B UD Q2_K_XL gives:

  • 66K context window
  • 14.65 tok/s at ~8k context
  • 10.21 tok/s at ~60k context
  • Still does an OK job with multiple tool calls with long prompts

Hope this comes in handy!

submitted by /u/Reddactor
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA