Cleo: trying to fit full analyst behavior in a 2B model [P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Hello all!
Half of all industrial "chatbots" are just text-to-SQL models in a trenchcoat (and the other half RAG!). I wanted to explore just how small you could make these models if you trained, evaluated, and ran inference in the exact same structured harness, leading to Cleo: a Qwen3.5-2B-Base finetune.
Currently, some features of cleo that are only possible/useful in a unified hardel are:
- Training on the exact same gather, repair, and answer contract it uses at inference time
- Searching over candidate queries with live execution evidence, not just model likelihood
- Co-designing the model contract, SQL safety layer, dialect handling, timeouts, and clarification behavior as one system
Everything is completely open-source, including the harness, model, and datasets.
GitHub: https://github.com/Dreeseaw/cleo
Hugging Face model: https://huggingface.co/dreeseaw/cleo
PS: If you're also resource-constrained and trying to do RL like me, I would highly recommend experimenting with ECHO: https://arxiv.org/abs/2605.24517
[link] [comments]
More from r/MachineLearning
-
Loss functions in Instance Representation Learning [R]
Jun 29
-
Price elasticity model [R]
Jun 29
-
Rejected MICCAI paper: workshop -> journal/conference or directly journal/conference [R]
Jun 29
-
I built a demo agricultural planning system with an AI advisor for small-scale farmers in Nicaragua using NASA data [p]
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.