NVIDIA Developer Blog · · 1 min read

Creating the NVIDIA Nemotron 3 Ultra NVFP4 Checkpoint with NVIDIA Model Optimizer

Mirrored from NVIDIA Developer Blog for archival readability. Support the source by reading on the original site.

Decorative image.As context windows grow longer, moving large model weights efficiently becomes critical to performance. A common way to address this is quantization, an...Decorative image.

As context windows grow longer, moving large model weights efficiently becomes critical to performance. A common way to address this is quantization, an optimization technique that compresses model weights into a smaller data format. One quantization format is NVFP4, an innovative 4-bit floating point introduced with NVIDIA Blackwell architecture. That’s the approach behind our new Nemotron 3…

Source

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from NVIDIA Developer Blog