ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I built ztok, a tokenizer library focused on being fast and format-agnostic for local pipelines.
- Loads what you already have — .tiktoken, HF tokenizer.json, SentencePiece .model, TokenMonster, Mistral Tekken. Auto-detected.
- Bit-identical to tiktoken / HF / SentencePiece on the equivalence gate, so it's a drop-in.
- Faster on the same vocab + same bytes (cl100k vs tiktoken, EPYC 24c/48t): ~2× single-thread, 3.8–5.5× batched (~291–425 MB/s vs ~78). Also faster than HF tokenizers andSentencePiece on their own vocabs.
- 8 language bindings over one C ABI — Python, Node, Ruby, Go, Rust, .NET, Java, Swift.
- Built for the boring-but-useful jobs: RAG chunking with token-cap windows + byte-accurate offsets, and dataset tokenization straight to .bin/.npy for training.
Zig 0.16, AGPL-3.0, ~1100 tests. Feedback welcome, especially on vocab formats I'm missing.
[link] [comments]
More from r/LocalLLaMA
-
How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)
May 22
-
BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
May 22
-
trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser
May 22
-
ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop
May 22
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.