r/LocalLLaMA · · 2 min read

I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I trained a small language model from scratch called KeyLM. It is 75M params, decoder-only, and there is a pretrained base, an instruction-tuned version, and a GGUF.

On IFEval (instruction following) the 75M instruct model scores slightly higher than the original SmolLM-135M-Instruct at about half the parameters and a fraction of the training data. (SmolLM was pre trained on 600B tokens and SmolLM2 on 2T tokens, but KeyLM is only pretrained on 18B tokens.)

Model Params IFEval --------------------- ------ ------ KeyLM-75M-Instruct 75M 17.85 SmolLM-135M-Instruct 135M 17.15 SmolLM2-135M-Instruct 135M 26.98 

Although the rest of the benchmarks are about what you would expect from a model at this size.

Bench Score ------------------------ ----- MMLU (acc) 24.0% ARC-Easy (acc) 40.3% ARC-Challenge (acc_norm) 22.6% HellaSwag (acc_norm) 31.0% PIQA (acc) 61.3% WinoGrande (acc) 48.3% 

Its Architecture is what you would normally see in a model this size standard. GQA (8 query / 2 KV heads), RoPE, SwiGLU, per-head QK-Norm, 24 layers, hidden 512, 2048 context, 12,020 ByteLevel BPE vocab, bf16.

Trained on ~18B tokens of public data (FineWeb-Edu, Wikipedia, Reddit, StackExchange, WildChat, UltraChat, LMSYS, OASST2, Cosmopedia), then SFT on smol-smoltalk and smoltalk2.

Here are some Sample outputs (KeyLM-75M-Instruct, greedy decoding):

Explain photosynthesis.

▎ Photosynthesis is the process plants use to turn sunlight, water, and carbon dioxide into energy. It also releases oxygen, which people and animals need to breathe.

What is the difference between weather and climate?

▎ Weather is what the atmosphere is doing over a short time, such as today's rain, wind, or temperature. Climate is the long-term pattern of weather in a region.

What is gravity?

▎ Gravity is the force that pulls objects toward each other. It keeps the Moon moving around Earth and Earth moving around the Sun.

Explain the water cycle.

▎ The sun warms water in oceans, lakes, rivers, and soil, causing some of it to evaporate into the air. The water vapor cools and forms clouds through condensation. When clouds become heavy, water falls as rain, snow, sleet, or hail.

What is inflation in economics?

▎ Inflation means prices are rising over time. If inflation is high, the same amount of money buys less than it did before.

Explain the concept of religion.

▎ Religion is a belief system that affects how people live, work, and interact with the world around them. It can shape culture, politics, and values.

Write a short thank-you note to a teacher.

▎ Thank you for your patience and support this year. I appreciate the time you put into helping me learn.

The model is English only and near random on knowledge benchmarks so it will hallucinate a basically all the time.

All three versions (Base, Instruct, GGUF) are published on huggingface so you can try it out for yourself:

submitted by /u/cakes_and_candles
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA