r/MachineLearning · · 1 min read

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Dataset for fine-tuning compliance assistants. Each pair includes:
- A practical SME-facing question ("Can I use pre-ticked consent boxes?")
- An answer with specific UK GDPR article references, ICO guidance by name, and actionable steps
- Source metadata: which GDPR concepts were used, which generation strategy, timestamp

Generation method: questions via local Qwen 14B from a curated term bank, answers via DeepSeek API for factual reliability. JSON + Parquet, MIT license for the 1K sample.

This is a niche dataset — it's not a benchmark contender, it's for people building privacy tools for UK businesses. If you're doing legal NLP or compliance RAG, might be useful.

Free sample: https://huggingface.co/datasets/Draeg82/uk-gdpr-small-business-qa

submitted by /u/a_serial_hobbyist_
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning