Identification of Patients With Metastatic Prostate Cancer With Natural Language Processing and Machine Learning

Ruixin Yang, Di Zhu, Lauren E. Howard, Amanda De Hoedt, Stephen B. Williams, Stephen J. Freedland, Zachary Klaassen

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


PURPOSE: Understanding treatment patterns and effectiveness for patients with metastatic prostate cancer (mPCa) is dependent on accurate assessment of metastatic status. The objective was to develop a natural language processing (NLP) model for identifying patients with mPCa and evaluate the model's performance against chart-reviewed data and an International Classification of Diseases (ICD) 9/10 code-based method. METHODS: In total, 139,057 radiology reports on 6,211 unique patients from the Department of Veterans Affairs were used. The gold standard was metastases by detailed chart review of radiology reports. NLP performance was assessed by sensitivity, specificity, positive predictive value, negative predictive value, and date of metastases detection. Receiver operating characteristic curves was used to assess model performance. RESULTS: When compared with chart review, the NLP model had high sensitivity and specificity (85% and 96%, respectively). The NLP model was able to predict patient-level metastasis status with a sensitivity of 91% and specificity of 81%, whereas sensitivity and specificity using ICD9/10 billing codes were 73% and 86%, respectively. For the NLP model, date of metastases detection was exactly concordant and within < 1 week in 55% and 58% of patients, compared with 8% and 17%, respectively, using the ICD9/10 billing codes method. The area under the curve for the NLP model was 0.911. A limitation is the NLP model was developed on the basis of a subset of patients with mPCa and may not be generalizable to all patients with mPCa. CONCLUSION: This population-level NLP model for identifying patients with mPCa was more accurate than using ICD9/10 billing codes when compared with chart-reviewed data. Upon further validation, this model may allow for efficient population-level identification of patients with mPCa.

Original languageEnglish (US)
Pages (from-to)e2100071
JournalJCO clinical cancer informatics
StatePublished - Oct 1 2022
Externally publishedYes

ASJC Scopus subject areas

  • General Medicine


Dive into the research topics of 'Identification of Patients With Metastatic Prostate Cancer With Natural Language Processing and Machine Learning'. Together they form a unique fingerprint.

Cite this