Skip to main content

Identifying Hospitalizations for Worsening Heart Failure with NLP

A Natural Language Processing–Based Approach for Identifying Hospitalizations for Worsening Heart Failure Within an Integrated Health Care Delivery System
21st Nov 2021


Andrew P. Ambrosy, MD; Rishi V. Parikh, MPH; Sue Hee Sung, MPH; Anand Narayanan, MD; Rajeev Masson, MD; Phuong-Quang Lam, MD; Kevin Kheder, MD; Alan Iwahashi, MD; Alexander B. Hardwick, MD; Jesse K. Fitzpatrick, MD; Harshith R. Avula, MD, MPH; Van N. Selby, MD; Xian Shen, PhD; Navneet Sanghera, MPharm; Joaquim Cristino, MSc; Alan S. Go, MD


The current understanding of epidemiological mechanisms and temporal trends in hospitalizations for worsening heart failure (WHF) is based on claims and national reporting databases. However, these data sources are inherently limited by the accuracy and completeness of diagnostic coding and/or voluntary reporting.


To assess the overall burden of and temporal trends in the rate of hospitalizations for WHF.


This cohort study, performed from January 1, 2010, to December 31, 2019, used electronic health record (EHR) data from a large integrated health care delivery system.

Main Outcomes and Measures

Hospitalizations for WHF (ie, excluding observation stays) were defined as 1 symptom or more, 2 objective findings or more including 1 sign or more, and 2 doses or more of intravenous loop diuretics and/or new hemodialysis or continuous kidney replacement therapy. Symptoms and signs were identified using natural language processing (NLP) algorithms applied to EHR data.


The study population was composed of 118 002 eligible patients experiencing 287 992 unique hospitalizations (mean [SD] age, 75.6 [13.1] years; 147 203 [51.1%] male; 1655 [0.6%] American Indian or Alaska Native, 28 451 [9.9%] Asian or Pacific Islander, 34 903 [12.1%] Black, 23 452 [8.1%] multiracial, 175 840 [61.1%] White, and 23 691 [8.2%] unknown), including 65 357 with a principal discharge diagnosis and 222 635 with a secondary discharge diagnosis of HF. The study population included 59 868 patients (20.8%) with HF with a reduced ejection fraction (HFrEF) (<40%), 33 361 (11.6%) with HF with a midrange EF (HFmrEF) (40%-49%), 142 347 (49.4%) with HF with a preserved EF (HFpEF) (≥50%), and 52 416 (18.2%) with unknown EF. A total of 58 042 admissions (88.8%) with a primary discharge diagnosis of HF and 62 764 admissions (28.2%) with a secondary discharge diagnosis of HF met the prespecified diagnostic criteria for WHF. Overall, hospitalizations for WHF identified on NLP-based algorithms increased from 5.2 to 7.6 per 100 hospitalizations per year during the study period. Subgroup analyses found an increase in hospitalizations for WHF based on NLP from 1.5 to 1.9 per 100 hospitalizations for HFrEF, from 0.6 to 1.0 per 100 hospitalizations for HFmrEF, and from 2.6 to 3.9 per 100 hospitalizations for HFpEF.

Conclusions and Relevance

The findings of this cohort study suggest that the burden of hospitalizations for WHF may be more than double that previously estimated using only principal discharge diagnosis. There has been a gradual increase in the rate of hospitalizations for WHF with a more noticeable increase observed for HFpEF.

Read full publication