AI content detector based on Qwen 0.8b fine-tuned on Pangram dataset
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I've fine-tuned Qwen 3.5 0.8B on the dataset provided by Pangram with their EditLens paper. It's available via a Chrome extension; you can just click selected text and it's going to give you the probability distribution of how likely it is AI-generated. It takes under 1s on my M1 MacBook Pro. Pangram did release Llama 3.2 3B trained on their dataset, but I found this model slightly too legacy (too big for the capabilities). Qwen 0.8B (base) ended up being as good after roughly 20h of fine-tuning on a single RTX 3090. I've also tried Qwen 2B and Gemma 4 e2b and e4b but Qwen 3.5 0.8b seems to be good enough to handle this task, frankly had the best result on the checkpoint I'm using in the release. Here's the link to the Chrome extension (Called it Slop Hammer 😅). Once installed, it will allow you to download the model from Hugging Face (around 400MB), after this step everything happens locally: https://chromewebstore.google.com/detail/slop-hammer/gfjdmhfokmhedlgfggmmgchpppmhkdgg Here's the model in onnx format: https://huggingface.co/Slomin/slop_hammer_0_8_b/tree/main. Small disclaimer: the model is licensed under CC-BY-NC-SA-4.0 due to restrictions of Pangram's EditLens dataset. If someone is interested, here's the article by Pangram: https://arxiv.org/abs/2510.03154 - it's a pretty interesting approach (using 4 distribution buckets instead of just one 0-1 float neuron). The limitations are mostly the dataset they did opensource, which was created with older LLM models. It is getting a bit confused on GPT-5.5, for example (but still will show it as AI-edited, etc., not purely written by a human). It's pretty hilarious to go through slop infested websites like Linkedin or certain subreddits... [link] [comments] |
More from r/LocalLLaMA
-
Locally-hosted language-learning AI you can talk to comparable to Pingo AI?
May 25
-
CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp
May 25
-
Is there any case of a less quantised smaller model outperforming a more quantised larger model?
May 25
-
Llama.cpp : Split Mode Tensor Fix Incoming?
May 25
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.