r/LocalLLaMA · · 3 min read

[NEW] Supra-50M Released!

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

[NEW] Supra-50M Released!

https://preview.redd.it/kx39ammxno2h1.jpg?width=1080&format=pjpg&auto=webp&s=d1a2d5b27920a5b61a50547a6e70a6378445cae4

SupraLabs released a new model! - Supra-50M

Supra-50M is a compact 50M-parameter causal language model (BASE and INSTRUCT versions) built from scratch by SupraLabs using a Llama-style architecture, trained on 20 billion tokens of high-quality educational web text. Despite being significantly smaller than comparable open models, it achieves competitive or superior results on several key benchmarks. This is our first SupraLabs Scaling Up Plan model.

🤗 Supra-50M-Base | Supra-50M-Instruct

What comes next?

  • Supra-124M — Base, Chat, Experimental Reasoning
  • Supra-350M — Base, Chat, Reasoning, Coding

🏆 Benchmarks

Benchmark Supra-50M (ours) GPT-2 (124M) SmolLM-135M OpenELM-270M
Parameters 50M 124M (2.5×) 135M (2.7×) 270M (5.4×)
BLiMP (linguistics) 76.3% 63.0% 69.8% N/A
SciQ (science) 77.2% 53.2% 73.4% 84.70%
ARC-Easy (knowledge) 52.2% 42.0% 49.2% 45.08%
PIQA (logic) 62.2% 63.0% 67.3% 69.75%
HellaSwag (context) 31.8% 29.5% 42.0% 46.71%

🧠 Architecture & Hyperparameters

Hyperparameter Value
Architecture Llama (decoder-only transformer)
Parameters ~50M
Vocab size 32,000
Hidden size 512
Intermediate size 1,408
Hidden layers 12
Attention heads 8
Key-value heads 4 (GQA)
Max position embeddings 1,024
RoPE theta 10,000
Tied embeddings Yes

📚 Training Data

Property Value
Dataset HuggingFaceFW/fineweb-edu (sample-100BT)
Total tokens 20B
Sequence length 1,024 tokens
Storage format Memory-mapped binary (uint16, ~40 GB)

🔤 Tokenizer

Custom Byte-Level BPE tokenizer trained from scratch on 500,000 documents sampled from fineweb-edu (sample-10BT).

Property Value
Type ByteLevelBPETokenizer
Vocabulary size 32,000
Min frequency 2
Special tokens <s>, <pad>, </s>, <unk>, <mask>

⚙️ Training Configuration

Parameter Value
Epochs 1
Per-device batch size 32
Gradient accumulation steps 4
Effective batch size 128 × 1,024 tokens
Learning rate 6e-4
LR scheduler Cosine
Warmup ratio 2%
Optimizer AdamW Fused (β1=0.9, β2=0.95)
Weight decay 0.1
Max grad norm 1.0
Precision bfloat16
torch.compile Enabled
Hardware Single GPU
Final loss 3.259

🚀 Inference — Instruct version

import os, warnings os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" warnings.filterwarnings("ignore", category=UserWarning, module="transformers") import torch from transformers import pipeline, AutoTokenizer, logging logging.set_verbosity_error() MODEL_ID = "SupraLabs/Supra-50M-Instruct" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False) pipe = pipeline( "text-generation", model=MODEL_ID, tokenizer=tokenizer, device_map="auto", torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32 ) def build_prompt(instruction, input_text=""): if input_text.strip(): return ( "Below is an instruction that describes a task, paired with an input " "that provides further context. Write a response that appropriately " "completes the request.\n\n" f"### Instruction:\n{instruction}\n\n" f"### Input:\n{input_text}\n\n### Response:\n" ) return ( "Below is an instruction that describes a task. Write a response that " "appropriately completes the request.\n\n" f"### Instruction:\n{instruction}\n\n### Response:\n" ) def generate(instruction, input_text=""): result = pipe( build_prompt(instruction, input_text), max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.9, repetition_penalty=1.15, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id, return_full_text=False ) return result[0]['generated_text'].strip() while True: print("\nEnter an instruction (or 'exit' to quit):") user_input = input().strip() if user_input.lower() == "exit": break print("\nEnter additional context (optional, press Enter to skip):") context_input = input().strip() print(f"\nResponse:\n{generate(user_input, context_input)}\n") 

Base version

from transformers import pipeline import torch pipe = pipeline( "text-generation", model="SupraLabs/Supra-50M_BASE", device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32 ) def generate_text(prompt, max_new_tokens=150): result = pipe( prompt, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.5, top_k=25, top_p=0.9, repetition_penalty=1.2, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id ) return result[0]['generated_text'] prompt = "The importance of education is" print(f"Prompt: {prompt}\n" + "-" * 40) print("\nOutput:\n" + generate_text(prompt)) 

💬 Sample Outputs

Prompt: "The main concept of physics is "

Prompt: "Artificial intelligence is "

Prompt: "Once upon a time, "

First model in the SupraLabs Scaling Up Plan. Feedback welcome!

submitted by /u/Dangerous_Try3619
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA