MTP and QTA - what is the relation?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I'm an old guy and I hate when things change so fast surrounded by noise and breaking news!
MTP, I know what the acronym means and where it excels.
Gemma4 31b dense is my target.
Unsloth, Google, GUFF, tensors... too many overlapped informations. I hate when I see no clear path.
Please help me...
FACT 1 = MTP has been merged in llama.cpp
FACT 2 = old GGUFs are not compatible
FACT 3 = I need a second file to load with the GGUF
Is fact checking ok?
Which GGUF is ok?
Why Unsloth added "QTA" magic string to its filenames with no clear relation to use cases?
Don't point me to hf/SomeRandomUsername/gemma4-31b-it-SomeRandomShit because I do not want to test some random GGUF.
I would like to test the baseline/official asset to make my opinion.
I'm not a bad person, but now internet, blogs and forums are like an Istanbul bazaar where every step you have to skip a scam/ad/shit.
Peace.
--- edit ---
QAT, not QTA.
That is the proof I'm not a BOT, lol...
[link] [comments]
More from r/LocalLLaMA
-
Galaxy Z Fold6 as a local inference node — llama.cpp/Vulkan, homelab telemetry, SHA-256 model verification
Jun 8
-
What's your experience with Gemma4 QAT?
Jun 8
-
llama-server router: a model pinned to one GPU still grabs a CUDA context on every card, so it OOMs when my others are full. Am I missing a flag or is this just how it is?
Jun 7
-
Qwen 3.6 27B on DeepSWE
Jun 7
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.