r/LocalLLaMA · · 1 min read

NVFP4 on llama.cpp?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hey everyone,

Even through I check the subreddit daily, some things are a bit hard to grasp for me due to the speed at progress is made (really impressive!). I tried doing research using deepseek v4 but it left me even more puzzled.

Recently I saw NVFP4 support being merged into llama.cpp. Since I have dual RTX 5060 Ti's, I would love to make use of it but I didn't fully grasp how.

I also saw someone releasing NVFP4 quants of Gemma4 QAT, seen here:
https://huggingface.co/melcheikh/gemma-4-31B-it-qat-NVFP4-Blackwell
https://huggingface.co/melcheikh/gemma-4-31B-it-qat-assistant-NVFP4-Blackwell

Which seemed interesting to use, but they have no GGUFs available.

Judging from my reddit search results ( https://www.reddit.com/r/LocalLLaMA/comments/1systb1/llamacpp_nvfp4_native_support_on_blackwell_from/ ), I think I need to produce the GGUF file myself.

I guess my questions are:

  • When converting NVFP4 safetensors to GGUF, is it the same process as with other quant types (like I did here https://huggingface.co/nohurry/gemma-4-26B-A4B-it-heretic-GUFF/blob/main/REPRODUCE.md, or are there specific layers I should pay attention to when quantizing NVFP4 safetensors?
  • When converting NVFP4 safetensors to GGUF, should I generate and apply an imatrix dataset too?
  • Any NVFP4 safetensors / NVFP4 GGUF providers you can recommend?

Sorry if my questions are a bit unclear, English isn't my native language.
Please correct me if I make mistakes!
And thank you for reading, your advice would be really appreciated.

submitted by /u/Kahvana
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA