UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Dataset for fine-tuning compliance assistants. Each pair includes:
- A practical SME-facing question ("Can I use pre-ticked consent boxes?")
- An answer with specific UK GDPR article references, ICO guidance by name, and actionable steps
- Source metadata: which GDPR concepts were used, which generation strategy, timestampGeneration method: questions via local Qwen 14B from a curated term bank, answers via DeepSeek API for factual reliability. JSON + Parquet, MIT license for the 1K sample.
This is a niche dataset — it's not a benchmark contender, it's for people building privacy tools for UK businesses. If you're doing legal NLP or compliance RAG, might be useful.
Free sample: https://huggingface.co/datasets/Draeg82/uk-gdpr-small-business-qa
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.