[NEW FAMILY OF MODELS] Supra1.5 family just released!
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
SupraLabs just released the Supra-1.5-exp line, Base, Instruct, and GGUF! (Reasoning soon)
Hey r/LocalLLaMA! We are releasing the experimental Supra-1.5-50M family today: a new Base model with 5x the context window of the original Supra-50M, an Instruct fine-tune on top of it, and a GGUF quantized version ready to run anywhere.
🤗 Supra-1.5-50M-Base-exp | 🤗 Supra-1.5-50M-Instruct-exp | 🤗 GGUF | Supra1.5 50M Instruct Demo
These are experimental releases. Part of Project Chimera.
This model uses Alpaca chat format!
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
[INSTRUCTION]
### Response:
With additional input:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
[INSTRUCTION]
### Input:
[CONTEXT]
### Response:
----
What changed from Supra-50M?
The biggest upgrade is context. Supra-1.5 expands from 1,024 to 5,120 tokens using RoPE scaling, with continued pretraining on a 3B token mix of tool calling data, ChatML conversations, factual text, and math. Same architecture, same tokenizer, just a much better base for SFT and future RL work.
| Spec | Supra-50M | Supra-1.5-50M |
|---|---|---|
| Context length | 1,024 tokens | 5,120 tokens |
| Training data (CPT) | 20B tokens (pretraining) | 3T tokens (continued) (experimental 1T) |
| Data mix | Fineweb-Edu only | Tool calling, ChatML, factual, math |
| Instruct format | Alpaca | ChatML |
Benchmarks (Instruct)
BLiMP sits at a consistent 67.4 across evaluations. The model also showed an interesting raw vs. normalized accuracy split: science and factual tasks perform better under raw inference, while math and logic tasks benefit from normalized inference. Make of that what you will for a 50M model.
The model is already listed on the Open SLM Leaderboard by AxiomicLabs.
Quick start
Base model:
from transformers import pipeline import torch print("[*] Loading Supra-1.5-50M Base...") pipe = pipeline( "text-generation", model="SupraLabs/Supra-1.5-50M-Base-exp", device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32 ) def generate_text(prompt, max_new_tokens=150): result = pipe( prompt, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.5, top_k=25, top_p=0.9, repetition_penalty=1.2, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id ) return result[0]['generated_text'] print(generate_text("The importance of education is")) Instruct model:
import os, warnings os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" warnings.filterwarnings("ignore", category=UserWarning, module="transformers") import torch from transformers import pipeline, AutoTokenizer, logging logging.set_verbosity_error() MODEL_ID = "SupraLabs/Supra-1.5-50M-Instruct-exp" tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False) pipe = pipeline( "text-generation", model=MODEL_ID, tokenizer=tokenizer, device_map="auto", torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32 ) def build_prompt(instruction, input_text=""): if input_text.strip(): return ( "Below is an instruction that describes a task, paired with an input " "that provides further context. Write a response that appropriately " "completes the request.\n\n" f"### Instruction:\n{instruction}\n\n" f"### Input:\n{input_text}\n\n### Response:\n" ) return ( "Below is an instruction that describes a task. Write a response that " "appropriately completes the request.\n\n" f"### Instruction:\n{instruction}\n\n### Response:\n" ) def generate(instruction, input_text=""): result = pipe( build_prompt(instruction, input_text), max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.9, repetition_penalty=1.15, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id, return_full_text=False ) return result[0]['generated_text'].strip() while True: print("\nEnter an instruction (or 'exit' to quit):") user_input = input().strip() if user_input.lower() == "exit": break print("\nEnter additional context (optional, press Enter to skip):") context_input = input().strip() print(f"\nResponse:\n{generate(user_input, context_input)}\n") GGUF quantizations:
| Bits | Quant | Size |
|---|---|---|
| 1-bit | Q1_D | 19.6 MB |
| 1-bit | TQ1_0 | 25.1 MB |
| 2-bit | Q2_K | 28.8 MB |
| 2-bit | TQ2_0 | 26.4 MB |
| 3-bit | IQ3_S | 31 MB |
| 3-bit | Q3_K_S | 31 MB |
| 3-bit | IQ3_M | 31.7 MB |
| 3-bit | Q3_K_M | 32.7 MB |
| 3-bit | Q3_K_L | 33.8 MB |
| 4-bit | IQ4_XS | 33.8 MB |
| 4-bit | Q4_K_S | 35.7 MB |
| 4-bit | IQ4_NL | 34.7 MB |
| 4-bit | Q4_0 | 34.5 MB |
| 4-bit | Q4_1 | 36.8 MB |
| 4-bit | Q4_K_M | 37.4 MB |
| 5-bit | Q5_K_S | 39.5 MB |
| 5-bit | Q5_0 | 39 MB |
| 5-bit | Q5_1 | 41.2 MB |
| 5-bit | Q5_K_M | 41 MB |
| 6-bit | Q6_K | 45.8 MB |
| 8-bit | Q8_0 | 56.2 MB |
| 16-bit | BF16 | 105 MB |
| 16-bit | F16 recommended | 105 MB |
| 32-bit | F32 recommended | 208 MB |
GGUF with llama.cpp:
# Run directly (replace Q4_K_M with your preferred quant) llama-cli -hf SupraLabs/Supra-1.5-50M-instruct-exp-gguf:Q4_K_M \ --chat-template alpaca \ -p "Write a short poem about open source AI." \ -n 256 # Or run as a local OpenAI-compatible server llama-server -hf SupraLabs/Supra-1.5-50M-instruct-exp-gguf:Q4_K_M \ --chat-template alpaca \ -c 5120 What's next?
Supra-124M - Base, Chat, Reasoning (legacy family, in production)
Supra-350M - Base, Chat, Reasoning, Coding (legacy family, in production)
All weights Apache 2.0. Feedback welcome!
[link] [comments]
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.