Neuron Populations Exhibit Divergent Selectivity with Scale [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| Hi! We just released a paper where we study “Rosetta Neurons”: universal neurons across different neural networks, and their relationship to scaling laws, specialization, and monosemanticity. Would love to kick off a discussion and get the community's thoughts. Main Findings: We find that the universal Rosetta Neurons scale as a sublinear power law: larger models have more of them, but they occupy a shrinking fraction of all neurons. They also become more selective/monosemantic and more specialized with scale. We can use a single Rosetta Neuron to filter data for continued pretraining and nearly match oracle data filtering. Paper: https://arxiv.org/abs/2606.03990 Summary thread: https://x.com/_AmilDravid/status/2062959617941074069?s=20 Code: https://github.com/avdravid/rosetta-neuron-scaling Project page: https://avdravid.github.io/rosetta-neuron-scaling/ [link] [comments] |
More from r/MachineLearning
-
Loss functions in Instance Representation Learning [R]
Jun 29
-
Price elasticity model [R]
Jun 29
-
Rejected MICCAI paper: workshop -> journal/conference or directly journal/conference [R]
Jun 29
-
I built a demo agricultural planning system with an AI advisor for small-scale farmers in Nicaragua using NASA data [p]
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.