Reasoning, but without actually *drafting* replies?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I've been experimenting a bit today with letting models reason for creative tasks, rationale being that it might help with keeping track of details and prompt adherence. And predictably, the wall I'm running into is that they all want to draft, check, refine, revise, "um actually...", and draft again before actually giving the output.
This is hugely wasteful no matter the reply length, but it's especially bad if you want more than a paragraph or two. Obviously I'm not the first person to realize this, I'm guessing it's the reason nobody uses reasoning for creative in the first place.
I tried with both Gemma 4 and Qwen3.6, and it seems like prompting isn't enough to actually control the reasoning process for either model - I can only add steps to the reasoning, not really remove them. I'm guessing it's a "built-in" methodology and fighting it is not really worth doing.
But I figure, maybe there is a method or workaround I don't know about. Some jinja template wizardry that just werks? Or maybe a good fine tune, or some other reasoning model that took this inefficiency into consideration? Or maybe someone can just confirm that this isn't really fixable for an end user.
Open to ideas/opinions/insults
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.