r/LocalLLaMA · · 1 min read

Is there any <3B model with usable 200k+ context window?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I need a small model for processing conversation transcripts from larger models, so need usable context window out to at least 200k tokens. I know some models claim to support this, but I don’t know which are actually good at this in practice.
Also desirable: low hallucination rate, not super verbose.

Some clarifications: this is for an interpretability project that operates entirely in prefill — I have no need to actually output tokens from the model. Size target is not a memory issue but rather prefill latency and throughput with 3B being the sweetspot of “probably fast enough” and “proven to be smart enough for this task in my experiments so far.”

Looks like qwen 3.5-2B has the best potential of meeting these requirements, will see if it works!

submitted by /u/madmax_br5
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA