r/LocalLLaMA · · 1 min read

Why does Thinking Output More Tokens Than a Response?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I was too lazy to use a vector DB + Embedding + Clustering for this list of 1000 items I wanted to categorize. I was hoping to use a local LLM to do it, but it would only respond with a list of about 100 items or so and their categories.

It confused me because when I saw the "thinking" aspect of the LLM, it would at least output every token in the input along with the massive amount of text used for thinking. From what I've seen, you'd need a specialized model for that, but....it seems like the "feature" is already in most models already.

What's up with that?

submitted by /u/iMakeSense
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA