Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage
Abstract:Shampoo-based methods, such as KL-Shampoo and SOAP, have demonstrated strong performance in training neural networks and rely on QR decomposition. Because existing QR implementations require single-precision (FP32) arithmetic and remain computationally expensive, these methods become time- and memory-intensive when their preconditioning matrices are large. Moreover, using BFloat16 (BFP16) storage to reduce memory usage can degrade the performance of Shampoo-based methods. We propose a reparametrization of the preconditioner that supports BFP16 storage and forms a complete basis by combining updated basis vectors with unchanged ones. By updating only part of the basis through QR decomposition in a subspace, our approach reduces computational overhead while mitigating the performance degradation caused by BFP16 storage. Our approach applies broadly to Shampoo-based methods that employ QR decomposition, including KL-Shampoo, SOAP, and KL-SOAP. In particular, it improves the performance of SOAP and KL-SOAP under BFP16 storage, enabling KL-SOAP to match or exceed KL-Shampoo. Overall, our approach makes Shampoo-based methods more memory- and time-efficient.
| Comments: | Preprint, working in progress |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.26327 [cs.LG] |
| (or arXiv:2605.26327v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.26327
arXiv-issued DOI via DataCite (pending registration)
|
Access Paper:
- View PDF
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
GEM: Geometric Entropy Mixing for Optimal LLM Data Curation
May 27
-
The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models
May 27
-
AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion
May 27
-
SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection
May 27
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.