arXiv — NLP / Computation & Language · June 10, 2026 · 4 min read

Pre-AF 13: An Interpretable Atrial Fibrillation Risk Score Mined from Discharge Reports

Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.

Like Read original ↗

Computer Science > Machine Learning

arXiv:2606.10725 (cs)

[Submitted on 9 Jun 2026]

Title:Pre-AF 13: An Interpretable Atrial Fibrillation Risk Score Mined from Discharge Reports

Authors:Olga Shakhmatova, Dmitrii Kriukov, Daniil Larionov, Nikita Khromov, Iaroslav Bespalov, Alexander Zolotarev, Kirill Grishchenkov, Ekaterina Ivanova, Miron Kuznetsov, Ilya Sochenkov, Elizaveta Panchenko, Artem Shelmanov, Dmitry V. Dylov

View a PDF of the paper titled Pre-AF 13: An Interpretable Atrial Fibrillation Risk Score Mined from Discharge Reports, by Olga Shakhmatova and 12 other authors

View PDF HTML (experimental)

Abstract:Background. Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia and a major determinant of prognosis. Established AF risk scores rely on factors (older age, hypertension) nearly ubiquitous among patients with cardiovascular disease (CVD), offering limited stratification in this high-risk group. Most target long-term (5-10 year) rather than medium-term prediction. We developed interpretable ML models predicting AF risk over a 24-month and entire follow-up horizon in CVD patients using routinely collected hospital data.
Methods. Single-center retrospective study of electronic health records from the National Research Cardiology Center (Russia) for patients aged >=18 with CVD but without pre-existing AF, hospitalized more than once between January 2012 and May 2019. A custom NLP pipeline transformed unstructured discharge reports into 73 structured features, combining a rule-based parser with transformer-based NER. Using LightAutoML we built a full model (73 features), a simple model (reduced subset), and a linear model for a bedside risk score. Performance was assessed by ROC AUC, compared with CHARGE-AF, C2HEST, MHS, and HAVOC, and interpreted via SHAP.
Results. Of 80,576 records from 45,000 patients, 17,562 met inclusion criteria; 1,438 (8.19%) developed AF. The full model reached ROC AUC 0.735 (24-month) and 0.696 (entire follow-up); the simple model was nearly identical (0.725, 0.696). All non-linear models outperformed the four clinical risk scores (ROC AUC 0.53-0.64). The simple model uses 13 features and is named Pre-AF 13. SHAP identified age and left atrial volume as dominant predictors. A linear risk score (Pre-AF 9) stratified observed 24-month AF incidence from ~7% to 36%.
Conclusion. Interpretable ML models built from routinely collected EHR data identify high-AF-risk CVD patients, outperforming established clinical risk scores.

Comments:	Main paper with appendix; 3 main figures, 3 supplementary figures, multiple tables. O. Shakhmatova and D. Kriukov contributed equally (co-first authors). E. Panchenko, A. Shelmanov, and D. V. Dylov are co-senior authors. Corresponding authors: O. Shakhmatova (this http URL@gmail.com) and D. V. Dylov (this http URL@skol.tech)
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2606.10725 [cs.LG]
	(or arXiv:2606.10725v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2606.10725 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Artem Shelmanov [view email]
[v1] Tue, 9 Jun 2026 11:33:46 UTC (985 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Pre-AF 13: An Interpretable Atrial Fibrillation Risk Score Mined from Discharge Reports, by Olga Shakhmatova and 12 other authors

View PDF
HTML (experimental)
TeX Source

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2026-06

Change to browse by:

cs
cs.CL

References & Citations

BibTeX formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

IArxiv recommender toggle

IArxiv Recommender (What is IArxiv?)

Author
Venue
Institution
Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Discussion (0)

No comments yet. Sign in and be the first to say something.