Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron
Mirrored from NVIDIA Developer Blog for archival readability. Support the source by reading on the original site.
Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved...
Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (MomentUm Orthogonalized by Newton-Schulz) was used to train some of today’s best open source models, including Kimi K2 and GLM-5.
More from NVIDIA Developer Blog
-
How to Govern Autonomous Agents in Enterprise AI Factories
Jun 29
-
Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure
Jun 26
-
Creating the NVIDIA Nemotron 3 Ultra NVFP4 Checkpoint with NVIDIA Model Optimizer
Jun 26
-
Streamlining Resource Binding with End-to-End Support for Vulkan Descriptor Heaps
Jun 25
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.