NVIDIA Developer Blog · · 1 min read

Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron

Mirrored from NVIDIA Developer Blog for archival readability. Support the source by reading on the original site.

Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved...

Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (MomentUm Orthogonalized by Newton-Schulz) was used to train some of today’s best open source models, including Kimi K2 and GLM-5.

Source

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from NVIDIA Developer Blog