r/LocalLLaMA · June 7, 2026 · 3 min read

5 Months Later: open-deepthink Now Has Full Knowledge Distillation Mode

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Some of you might remember when I posted about this project back around September last year (it was called local-deepthink then). The core idea was to move past the usual flat multi-agent setups and instead build something that creates depth.

It already ran great locally with llama.cpp or via OpenRouter, and you could export the evolved networks for reuse. But the distillation piece was still coming together.

Now in this new mode you fire up a fixed 7-layer QNN topology, set a token budget, and let it run. The agents evolve live during the session—replacing weak performers, inheriting knowledge, deepening their collaboration. At the end you get clean, structured JSON datasets containing the entire developmental trace of every bit of knowledge that could be extracted from your target LLM: every epoch, every agent’s sub-task reasoning, every mutation, every difficulty ramp-up, plus a full topology_archive.json of the evolutionary history. Say you want to have in your fine-tune all gemini knows about theosophy; high-occultism is my particular use case: and for whatever reasons the google devs threw every book about fringe theosophy at Gemini training. What do you do do if you want Gemini answers combining theosophy with hypotheticals about biology? Well i humbly propose the usage of this distillation technique. If you can think about two topics where some closed source model does great at but your open source models are ignorant about, try distilling all the possible hypotheticals before you frame specific questions with this techniqe: it will get the fundamentals of the topics up until whatever deep degree of hallucinatory degree you want e.g: astrobiology with questions about italian cuisine with unreasonable amounts of abstraction. Open-deepthink is pretty much the ultimate software and collection of techniques for unreasonable excess.

I just shipped beta-0.0.3 today (11 bugs fixed, 195/195 tests passing, now officially rebranded to open-deepthink, with improved per-agent model selection and local stability). The repo is here if you want to try it:

https://github.com/iblameandrew/open-deepthink

If you want grok-heavy at API price, please give this a try. You get an ulimited army of agents (hundreds if you such desire) using whatever model you want, to think about whatever you need to debug... or get that army of agents to think about some hard theory crafting problem. If you are stuck at a problem where opencode just doesnet find you a solution... then before you waste your own time reading the code, dump the whole stack trace into a open-deepthink topology with 20-100 agents. You will 100% get things moving. Maybe you want to review stocks but grok-heavy 16 agents will give you 70s of think-time that you feel are not enough? You also have 50 dollars of open router credits you say? Fine. Fire up in open-deepthink in brainstorm mode a 10x10 (100 agents) topology and make it reflect for 40 epochs (60 hours) on your problem. Let's be honest: at the end, you'll get overcooked expensive slop, but it will be your overcooked slop. And by the time it finishes, you will know that this is the best AI was able to do. Please give this a try, and if you haven't give it support; double check.

Cheers,
Andrew

submitted by /u/causality-ai
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA