r/LocalLLaMA · May 22, 2026 · 2 min read

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser

Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3. You can try it here:

https://huggingface.co/spaces/av-codes/prompt-injection-detector

---

I've been interested in prompt injections and agentic security for a while, and wanted to see how a purpose-built ML agent compares to general-purpose coding agents for this kind of task.

Here's roughly how it went:

ml-intern takes an HF token and supports OpenAI-compatible APIs, so I pointed it at OpenRouter (GPU-poor). The agent found existing datasets, deepset/prompt-injections and Shomi28/prompt-injection-dataset, which simplified things since building the dataset is typically 95% of the work in tasks like this.

For v1, I went with DistilBERT targeting CPU inference. After a few parameter sweeps, the agent launched a full run and landed at F1 95.87%.

I also tried training an HRM-Text model, but the agent didn't figure it out and set up a TRM run instead (different architecture, no positional encoding). When I steered it back to HRM with the correct paper, the training script wasn't optimized for my hardware. I spent $20 on HF remote training with a T4, but it fumbled after epoch 1 because agent didn't follow training routine from the paper and used wrong optimiser/params leading to params blowing up.

For v2, I found a larger synthetic dataset from Bordair and re-trained the DistilBERT. That's the model in the Space above.

What surprised me:

DeepSeek v4 Flash via API cost under $5 total for all agent runs
the agent was more hands-off than expected on the happy path
it broke down on non-standard architectures
it naturally leans toward the HF stack, which was fine for this, but worth knowing

The obvious gap: the synthetic dataset means the train/test splits might be too similar. Not a proper scientific approach, but it's the most pleasant ML experience I've had with an agentic tool so far.

The HRM run is still pending. I'm curious to learn about other people's experiences with these tools.

Thank you!

submitted by /u/Everlier
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA