https://preview.redd.it/kx39ammxno2h1.jpg?width=1080&format=pjpg&auto=webp&s=d1a2d5b27920a5b61a50547a6e70a6378445cae4
SupraLabs released a new model! - Supra-50M
Supra-50M is a compact 50M-parameter causal language model (BASE and INSTRUCT versions) built from scratch by SupraLabs using a Llama-style architecture, trained on 20 billion tokens of high-quality educational web text. Despite being significantly smaller than comparable open models, it achieves competitive or superior results on several key benchmarks. This is our first SupraLabs Scaling Up Plan model.
🤗 Supra-50M-Base | Supra-50M-Instruct
What comes next?
- Supra-124M — Base, Chat, Experimental Reasoning
- Supra-350M — Base, Chat, Reasoning, Coding
🏆 Benchmarks
| Benchmark | Supra-50M (ours) | GPT-2 (124M) | SmolLM-135M | OpenELM-270M |
| Parameters | 50M | 124M (2.5×) | 135M (2.7×) | 270M (5.4×) |
| BLiMP (linguistics) | 76.3% | 63.0% | 69.8% | N/A |
| SciQ (science) | 77.2% | 53.2% | 73.4% | 84.70% |
| ARC-Easy (knowledge) | 52.2% | 42.0% | 49.2% | 45.08% |
| PIQA (logic) | 62.2% | 63.0% | 67.3% | 69.75% |
| HellaSwag (context) | 31.8% | 29.5% | 42.0% | 46.71% |
🧠 Architecture & Hyperparameters
| Hyperparameter | Value |
| Architecture | Llama (decoder-only transformer) |
| Parameters | ~50M |
| Vocab size | 32,000 |
| Hidden size | 512 |
| Intermediate size | 1,408 |
| Hidden layers | 12 |
| Attention heads | 8 |
| Key-value heads | 4 (GQA) |
| Max position embeddings | 1,024 |
| RoPE theta | 10,000 |
| Tied embeddings | Yes |
📚 Training Data
| Property | Value |
| Dataset | HuggingFaceFW/fineweb-edu (sample-100BT) |
| Total tokens | 20B |
| Sequence length | 1,024 tokens |
| Storage format | Memory-mapped binary (uint16, ~40 GB) |
🔤 Tokenizer
Custom Byte-Level BPE tokenizer trained from scratch on 500,000 documents sampled from fineweb-edu (sample-10BT).
| Property | Value |
| Type | ByteLevelBPETokenizer |
| Vocabulary size | 32,000 |
| Min frequency | 2 |
| Special tokens | <s>, <pad>, </s>, <unk>, <mask> |
⚙️ Training Configuration
| Parameter | Value |
| Epochs | 1 |
| Per-device batch size | 32 |
| Gradient accumulation steps | 4 |
| Effective batch size | 128 × 1,024 tokens |
| Learning rate | 6e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 2% |
| Optimizer | AdamW Fused (β1=0.9, β2=0.95) |
| Weight decay | 0.1 |
| Max grad norm | 1.0 |
| Precision | bfloat16 |
| torch.compile | Enabled |
| Hardware | Single GPU |
| Final loss | 3.259 |
🚀 Inference — Instruct version
import os, warnings os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" warnings.filterwarnings("ignore", category=UserWarning, module="transformers") import torch from transformers import pipeline, AutoTokenizer, logging logging.set_verbosity_error() MODEL_ID = "SupraLabs/Supra-50M-Instruct" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False) pipe = pipeline( "text-generation", model=MODEL_ID, tokenizer=tokenizer, device_map="auto", torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32 ) def build_prompt(instruction, input_text=""): if input_text.strip(): return ( "Below is an instruction that describes a task, paired with an input " "that provides further context. Write a response that appropriately " "completes the request.\n\n" f"### Instruction:\n{instruction}\n\n" f"### Input:\n{input_text}\n\n### Response:\n" ) return ( "Below is an instruction that describes a task. Write a response that " "appropriately completes the request.\n\n" f"### Instruction:\n{instruction}\n\n### Response:\n" ) def generate(instruction, input_text=""): result = pipe( build_prompt(instruction, input_text), max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.9, repetition_penalty=1.15, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id, return_full_text=False ) return result[0]['generated_text'].strip() while True: print("\nEnter an instruction (or 'exit' to quit):") user_input = input().strip() if user_input.lower() == "exit": break print("\nEnter additional context (optional, press Enter to skip):") context_input = input().strip() print(f"\nResponse:\n{generate(user_input, context_input)}\n")
Base version
from transformers import pipeline import torch pipe = pipeline( "text-generation", model="SupraLabs/Supra-50M_BASE", device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32 ) def generate_text(prompt, max_new_tokens=150): result = pipe( prompt, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.5, top_k=25, top_p=0.9, repetition_penalty=1.2, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id ) return result[0]['generated_text'] prompt = "The importance of education is" print(f"Prompt: {prompt}\n" + "-" * 40) print("\nOutput:\n" + generate_text(prompt))
💬 Sample Outputs
Prompt: "The main concept of physics is "
Prompt: "Artificial intelligence is "
Prompt: "Once upon a time, "
First model in the SupraLabs Scaling Up Plan. Feedback welcome!
submitted by
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.