r/LocalLLaMA · · 1 min read

Meddies PII: An Open Multilingual De-identification Model for Clinical Text

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

A clinical AI model does not need to know who the patient is to reason clinically.

It needs the symptoms, medications, lab results, diagnosis history, and treatment course.

The problem is that in real medical records, those facts usually sit next to identifiers: names, record IDs, insurance numbers, addresses, phone numbers, admission dates, department names.

So clinical de-identification has a double contract:
1. Do not let patient identifiers leak.
2. Do not destroy the clinical facts that still need to be used.
That second part is easy to underestimate.

If a model misses a date of birth, the privacy boundary fails. If it removes
"creatinine 86 µmol/L" or "metformin 500 mg," the downstream clinical record loses meaning. Both are failures, but they have different consequences.

We built Meddies PII for this problem. It is an open research model and dataset for multilingual clinical de-identification. The dataset is synthetic and built with dynamic prompting, varying language, document type, document label, note length, text format, edge case, and identifier family across generations.

The goal is not one pretty template. The goal is stable extraction behavior across the messy surfaces hospital data actually appears in: rushed notes, nursing forms, JSON/XML exports, multilingual text, administrative records, and chat-style prompts.

Meddies PII is not a complete de-identification product. Hospitals still need policy, audit logs, local validation, human escalation paths, and deployment controls.

But we think this is a useful starting point: open enough to inspect, careful enough to discuss honestly, and built from the reality that clinical AI needs more than benchmark performance to be deployable.

Full post: https://meddies.ai/research/meddies-pii

Demo: https://huggingface.co/spaces/Meddies/meddies-pii-extractor

Model: https://huggingface.co/Meddies/meddies-pii

Dataset: https://huggingface.co/datasets/Meddies/meddies-pii

submitted by /u/TheREXincoming
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA