I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
I have analyzed some decoder transformer models using Lyapunov spectral analysis and found that the ratio of the MLP and attention spectral norms strongly indicates whether a model will eventually collapse to rank-1 or not by the final layers.
I found that the spectral ratio is best kept around 0.5–2 for keeping the model stable till the final layers.
Paper/Github repo: https://github.com/yousef-rafat/the-1-1-rule
[link] [comments]
More from r/MachineLearning
-
Trained transformer-based chess models to play like humans (including thinking time) [P]
May 13
-
Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]
May 13
-
What kinds of models are people training with document data? [P]
May 13
-
Have the "on-hold" durations been getting longer for arXiv submissions? [D]
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.