To determine whether natural language processing (NLP) of unstructured medical text can improve identification of ASCVD patients not using high-intensity statin therapy (HIST) due to statin-associated side effects (SASEs) and other reasons.
Reviewers annotated reasons for not prescribing HIST in notes of 1152 randomly selected patients from across the VA healthcare system treated for ASCVD but not receiving HIST. Developers used reviewer annotations to train the Canary NLP tool to detect and extract notes containing one or more of these reasons. Negative predictive value (NPV), sensitivity, specificity and Area Under the Curve (AUC) were used to assess accuracy at detecting documents containing reasons when using structured data, NLP-extracted unstructured data, or both data sources combined.
At least one documented reason for not prescribing HIST occurred in 47% of notes. The most frequent reasons were SASEs (41%) and general intolerance (20%). When identifying notes containing any documented reason for not using HIST, adding NLP-extracted, unstructured data significantly (p<0.05) increased sensitivity (0.69 (95% confidence interval [CI] 0.60-0.76) to 0.89 (95% CI 0.81-0.93)), NPV (0.90 (95% CI 0.87 to 0.93) to 0.96 (95% CI 0.93-0.98)), and AUC (0.84 (95% confidence interval [CI] 0.81-0.88) to 0.91 (95% CI 0.90-0.93)) compared to structured data alone.
NLP extraction of data from unstructured text can improve identification of reasons for patients not being on HIST over structured data alone. The additional information provided through NLP of unstructured free text should help in tailoring and implementing system-level interventions to improve HIST use in patients with ASCVD.