DeepSpec - a deepseek-ai Collection
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| DeepSpecDeepSpec is a full-stack codebase for training and evaluating draft models for speculative decoding. It contains data preparation utilities, draft model implementations, training code, and evaluation scripts. Released CheckpointsThe checkpoints below are the ones used for Table 1 in the paper. Each checkpoint was trained on open-perfectblend data generated by its corresponding target model in non-thinking mode, and is the direct output of the corresponding training configuration under config/.
If you cite these results in a new paper, align your setup with the training settings in this repository; otherwise, the comparison is not meaningful. For domain-specific use, fine-tune the draft model again for better results, especially if the target model is expected to run in thinking mode. Supported AlgorithmsCurrently, DeepSpec includes three draft models: DSpark, DFlash and Eagle3. HuggingFace : https://huggingface.co/collections/deepseek-ai/deepspec [link] [comments] |
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.