r/MachineLearning · · 1 min read

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

RPS is inspired by neuroscience. As humans, we learn basic skills as kids with high neuro-plasticity. We then learn advanced skills as teens and adults with low neuro-plasticity. RPS trains a model in 2 stages. In stage 1, the model is trained on easy data with high learning rate. In stage 2, the model is trained on hard data with 10% the learning rate of stage 1. RPS is basically a combination of existing ideas: curriculum learning + learning rate decay.

ARC-AGI 1 public eval scores:

base model: Qwen3-8b

RPS: 4%

EPS (equal learning rate in both stages): 2.4%

Program Synthesis Stats:

Program executions without error:

RPS: 1145/1200

EPS: 870/1200

https://iamjasonfeng.blogspot.com/2026/05/regressive-plasticity-schedule.html

https://github.com/iamjasonfeng/RPS

submitted by /u/iamjasonfeng
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning