I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
RPS is inspired by neuroscience. As humans, we learn basic skills as kids with high neuro-plasticity. We then learn advanced skills as teens and adults with low neuro-plasticity. RPS trains a model in 2 stages. In stage 1, the model is trained on easy data with high learning rate. In stage 2, the model is trained on hard data with 10% the learning rate of stage 1. RPS is basically a combination of existing ideas: curriculum learning + learning rate decay.
ARC-AGI 1 public eval scores:
base model: Qwen3-8b
RPS: 4%
EPS (equal learning rate in both stages): 2.4%
Program Synthesis Stats:
Program executions without error:
RPS: 1145/1200
EPS: 870/1200
https://iamjasonfeng.blogspot.com/2026/05/regressive-plasticity-schedule.html
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.