NVFP4 Kimi2.6 and Kimi 2.5 released by Nvidia
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
The NVIDIA Kimi-K2.6-NVFP4 model is the quantized version of the Moonshot AI's Kimi-K2.6 model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Kimi-K2.6 NVFP4 model is quantized with Model Optimizer.
This model is ready for commercial/non-commercial use.
The accuracy benchmark results are presented in the table below:
| Precision | GPQA Diamond | SciCode | τ²-Bench Telecom | MMMU Pro | AA-LCR | IFBench |
|---|---|---|---|---|---|---|
| Baseline (INT4) | 90.9 | 52.6 | 98.2 | 75.6 | 71.0 | 73.9 |
| NVFP4 | 90.4 | 54.4 | 98.0 | 76.5 | 71.8 | 73.9 |
Baseline: Kimi-K2.6 in its native INT4 format. Benchmarked with temperature=1.0, top_p=0.95, max num tokens 128000.
Links:
[link] [comments]
More from r/LocalLLaMA
-
Got local Qwen 3.5/3.6 generating meeting summaries entirely offline on an M4 Max. Demo with Wi-Fi off. This is the future.
May 14
-
Dropping learning rate fixed my Qlora fine-tune more than anything else i tried
May 14
-
Scenema Audio: Zero-shot expressive voice cloning and speech generation
May 14
-
[Benchmark] 5090RTX: Promt Parsing, Token Generation and Power Level
May 14
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.