Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) optimized for Agentic Verification + AgentHarness Evals
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Hey r/LocalLLaMA, We just released Apodex 1.0, and alongside our flagship API, we are releasing the weights for our Smol models (0.8B, 2B, and 4B). Our core research focuses on independent verification in long-horizon tasks. Instead of just scaling up parameter sizes for raw generation, we’ve been experimenting with small, highly specialized local models that handle specific sub-tasks in an agentic loop (like source cross-examination, hypothesis testing, and tool-grounded synthesis). We wanted to share the open weights and our evaluation harness with the community to get your thoughts on local agent workflows. 🧠 The Setup: What are these Smol models for?When running long-horizon agents locally, using a massive 70B+ model for every single step (like checking if a URL is broken or verifying a regex) is incredibly inefficient. We specialized these 0.8B, 2B, and 4B models to act as sub-agents within our AgentOS runtime. They are trained to:
📊 Flagship Model Benchmarks (For Context)To give you an idea of what the full architecture is capable of when these verification loops are running at scale, our flagship model (Apodex-1.0-H) achieved the following scores:
🛠️ Open-Source Components & Local EvalsWe’ve open-sourced AgentHarness, which is the framework we use to test and evaluate these agentic workflows locally without drifting over 50+ steps. The open-weight models are hosted on Hugging Face, and the evaluation code is on GitHub. (Note: To keep this post strictly compliant with the sub's rules, I’ve put all the Hugging Face links, GitHub repos, and the free early-access web platform in the stickied comment below). For those into local agent orchestration:
Would love to hear your feedback, and let me know if you want us to cook up some GGUF/EXL2 quants for these! [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.