Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO):
Only genuine revisions under adversarial pressure become training signal. Not format preference, not sampling variance. Results:
Autonomous loop now running: GGUF ready for Ollama. Happy to share the pipeline if there's interest. [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.