TY - JOUR
T1 - Identification of Patients With Metastatic Prostate Cancer With Natural Language Processing and Machine Learning
AU - Yang, Ruixin
AU - Zhu, Di
AU - Howard, Lauren E.
AU - De Hoedt, Amanda
AU - Williams, Stephen B.
AU - Freedland, Stephen J.
AU - Klaassen, Zachary
PY - 2022/10/1
Y1 - 2022/10/1
N2 - PURPOSE: Understanding treatment patterns and effectiveness for patients with metastatic prostate cancer (mPCa) is dependent on accurate assessment of metastatic status. The objective was to develop a natural language processing (NLP) model for identifying patients with mPCa and evaluate the model's performance against chart-reviewed data and an International Classification of Diseases (ICD) 9/10 code-based method. METHODS: In total, 139,057 radiology reports on 6,211 unique patients from the Department of Veterans Affairs were used. The gold standard was metastases by detailed chart review of radiology reports. NLP performance was assessed by sensitivity, specificity, positive predictive value, negative predictive value, and date of metastases detection. Receiver operating characteristic curves was used to assess model performance. RESULTS: When compared with chart review, the NLP model had high sensitivity and specificity (85% and 96%, respectively). The NLP model was able to predict patient-level metastasis status with a sensitivity of 91% and specificity of 81%, whereas sensitivity and specificity using ICD9/10 billing codes were 73% and 86%, respectively. For the NLP model, date of metastases detection was exactly concordant and within < 1 week in 55% and 58% of patients, compared with 8% and 17%, respectively, using the ICD9/10 billing codes method. The area under the curve for the NLP model was 0.911. A limitation is the NLP model was developed on the basis of a subset of patients with mPCa and may not be generalizable to all patients with mPCa. CONCLUSION: This population-level NLP model for identifying patients with mPCa was more accurate than using ICD9/10 billing codes when compared with chart-reviewed data. Upon further validation, this model may allow for efficient population-level identification of patients with mPCa.
AB - PURPOSE: Understanding treatment patterns and effectiveness for patients with metastatic prostate cancer (mPCa) is dependent on accurate assessment of metastatic status. The objective was to develop a natural language processing (NLP) model for identifying patients with mPCa and evaluate the model's performance against chart-reviewed data and an International Classification of Diseases (ICD) 9/10 code-based method. METHODS: In total, 139,057 radiology reports on 6,211 unique patients from the Department of Veterans Affairs were used. The gold standard was metastases by detailed chart review of radiology reports. NLP performance was assessed by sensitivity, specificity, positive predictive value, negative predictive value, and date of metastases detection. Receiver operating characteristic curves was used to assess model performance. RESULTS: When compared with chart review, the NLP model had high sensitivity and specificity (85% and 96%, respectively). The NLP model was able to predict patient-level metastasis status with a sensitivity of 91% and specificity of 81%, whereas sensitivity and specificity using ICD9/10 billing codes were 73% and 86%, respectively. For the NLP model, date of metastases detection was exactly concordant and within < 1 week in 55% and 58% of patients, compared with 8% and 17%, respectively, using the ICD9/10 billing codes method. The area under the curve for the NLP model was 0.911. A limitation is the NLP model was developed on the basis of a subset of patients with mPCa and may not be generalizable to all patients with mPCa. CONCLUSION: This population-level NLP model for identifying patients with mPCa was more accurate than using ICD9/10 billing codes when compared with chart-reviewed data. Upon further validation, this model may allow for efficient population-level identification of patients with mPCa.
UR - http://www.scopus.com/inward/record.url?scp=85139507133&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139507133&partnerID=8YFLogxK
U2 - 10.1200/CCI.21.00071
DO - 10.1200/CCI.21.00071
M3 - Article
C2 - 36215673
AN - SCOPUS:85139507133
SN - 2473-4276
VL - 6
SP - e2100071
JO - JCO clinical cancer informatics
JF - JCO clinical cancer informatics
ER -