NVFP4 on llama.cpp?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hey everyone,
Even through I check the subreddit daily, some things are a bit hard to grasp for me due to the speed at progress is made (really impressive!). I tried doing research using deepseek v4 but it left me even more puzzled.
Recently I saw NVFP4 support being merged into llama.cpp. Since I have dual RTX 5060 Ti's, I would love to make use of it but I didn't fully grasp how.
I also saw someone releasing NVFP4 quants of Gemma4 QAT, seen here:
https://huggingface.co/melcheikh/gemma-4-31B-it-qat-NVFP4-Blackwell
https://huggingface.co/melcheikh/gemma-4-31B-it-qat-assistant-NVFP4-Blackwell
Which seemed interesting to use, but they have no GGUFs available.
Judging from my reddit search results ( https://www.reddit.com/r/LocalLLaMA/comments/1systb1/llamacpp_nvfp4_native_support_on_blackwell_from/ ), I think I need to produce the GGUF file myself.
I guess my questions are:
- When converting NVFP4 safetensors to GGUF, is it the same process as with other quant types (like I did here https://huggingface.co/nohurry/gemma-4-26B-A4B-it-heretic-GUFF/blob/main/REPRODUCE.md, or are there specific layers I should pay attention to when quantizing NVFP4 safetensors?
- When converting NVFP4 safetensors to GGUF, should I generate and apply an imatrix dataset too?
- Any NVFP4 safetensors / NVFP4 GGUF providers you can recommend?
Sorry if my questions are a bit unclear, English isn't my native language.
Please correct me if I make mistakes!
And thank you for reading, your advice would be really appreciated.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.