I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
I have analyzed some decoder transformer models using Lyapunov spectral analysis and found that the ratio of the MLP and attention spectral norms strongly indicates whether a model will eventually collapse to rank-1 or not by the final layers.
I found that the spectral ratio is best kept around 0.5–2 for keeping the model stable till the final layers.
Paper/Github repo: https://github.com/yousef-rafat/the-1-1-rule
[link] [comments]
More from r/MachineLearning
-
Image generation models running locally on limited resources [P]
May 13
-
EEML Summer School (Eastern European ML) - Anyone here got accepted? [D]
May 13
-
Best examples of ML projects with good dataset/task code abstractions? [D]
May 13
-
Human-level performance via ML was *not* proven impossible with complexity theory [D]
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.