r/LocalLLaMA · May 22, 2026 · 3 min read

[NEW] Supra-50M Released!

#model-release

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

https://preview.redd.it/kx39ammxno2h1.jpg?width=1080&format=pjpg&auto=webp&s=d1a2d5b27920a5b61a50547a6e70a6378445cae4

SupraLabs released a new model! - Supra-50M

Supra-50M is a compact 50M-parameter causal language model (BASE and INSTRUCT versions) built from scratch by SupraLabs using a Llama-style architecture, trained on 20 billion tokens of high-quality educational web text. Despite being significantly smaller than comparable open models, it achieves competitive or superior results on several key benchmarks. This is our first SupraLabs Scaling Up Plan model.

🤗 Supra-50M-Base | Supra-50M-Instruct

What comes next?

Supra-124M — Base, Chat, Experimental Reasoning
Supra-350M — Base, Chat, Reasoning, Coding

🏆 Benchmarks

Benchmark	Supra-50M (ours)	GPT-2 (124M)	SmolLM-135M	OpenELM-270M
Parameters	50M	124M (2.5×)	135M (2.7×)	270M (5.4×)
BLiMP (linguistics)	76.3%	63.0%	69.8%	N/A
SciQ (science)	77.2%	53.2%	73.4%	84.70%
ARC-Easy (knowledge)	52.2%	42.0%	49.2%	45.08%
PIQA (logic)	62.2%	63.0%	67.3%	69.75%
HellaSwag (context)	31.8%	29.5%	42.0%	46.71%

🧠 Architecture & Hyperparameters

Hyperparameter	Value
Architecture	Llama (decoder-only transformer)
Parameters	~50M
Vocab size	32,000
Hidden size	512
Intermediate size	1,408
Hidden layers	12
Attention heads	8
Key-value heads	4 (GQA)
Max position embeddings	1,024
RoPE theta	10,000
Tied embeddings	Yes

📚 Training Data

Property	Value
Dataset	HuggingFaceFW/fineweb-edu (`sample-100BT`)
Total tokens	20B
Sequence length	1,024 tokens
Storage format	Memory-mapped binary (`uint16`, ~40 GB)

🔤 Tokenizer

Custom Byte-Level BPE tokenizer trained from scratch on 500,000 documents sampled from fineweb-edu (sample-10BT).

Property	Value
Type	ByteLevelBPETokenizer
Vocabulary size	32,000
Min frequency	2
Special tokens	`<s>`, `<pad>`, `</s>`, `<unk>`, `<mask>`

⚙️ Training Configuration

Parameter	Value
Epochs	1
Per-device batch size	32
Gradient accumulation steps	4
Effective batch size	128 × 1,024 tokens
Learning rate	6e-4
LR scheduler	Cosine
Warmup ratio	2%
Optimizer	AdamW Fused (β1=0.9, β2=0.95)
Weight decay	0.1
Max grad norm	1.0
Precision	bfloat16
torch.compile	Enabled
Hardware	Single GPU
Final loss	3.259

🚀 Inference — Instruct version

import os, warnings os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" warnings.filterwarnings("ignore", category=UserWarning, module="transformers") import torch from transformers import pipeline, AutoTokenizer, logging logging.set_verbosity_error() MODEL_ID = "SupraLabs/Supra-50M-Instruct" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False) pipe = pipeline( "text-generation", model=MODEL_ID, tokenizer=tokenizer, device_map="auto", torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32 ) def build_prompt(instruction, input_text=""): if input_text.strip(): return ( "Below is an instruction that describes a task, paired with an input " "that provides further context. Write a response that appropriately " "completes the request.\n\n" f"### Instruction:\n{instruction}\n\n" f"### Input:\n{input_text}\n\n### Response:\n" ) return ( "Below is an instruction that describes a task. Write a response that " "appropriately completes the request.\n\n" f"### Instruction:\n{instruction}\n\n### Response:\n" ) def generate(instruction, input_text=""): result = pipe( build_prompt(instruction, input_text), max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.9, repetition_penalty=1.15, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id, return_full_text=False ) return result[0]['generated_text'].strip() while True: print("\nEnter an instruction (or 'exit' to quit):") user_input = input().strip() if user_input.lower() == "exit": break print("\nEnter additional context (optional, press Enter to skip):") context_input = input().strip() print(f"\nResponse:\n{generate(user_input, context_input)}\n")

Base version

from transformers import pipeline import torch pipe = pipeline( "text-generation", model="SupraLabs/Supra-50M_BASE", device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32 ) def generate_text(prompt, max_new_tokens=150): result = pipe( prompt, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.5, top_k=25, top_p=0.9, repetition_penalty=1.2, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id ) return result[0]['generated_text'] prompt = "The importance of education is" print(f"Prompt: {prompt}\n" + "-" * 40) print("\nOutput:\n" + generate_text(prompt))

💬 Sample Outputs

Prompt: "The main concept of physics is "

Prompt: "Artificial intelligence is "

Prompt: "Once upon a time, "

First model in the SupraLabs Scaling Up Plan. Feedback welcome!

submitted by /u/Dangerous_Try3619
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.