TY - JOUR
T1 - Lessons and tips for designing a machine learning study using EHR data
AU - Arbet, Jaron
AU - Brokamp, Cole
AU - Meinzen-Derr, Jareen
AU - Trinkley, Katy E.
AU - Spratt, Heidi M.
N1 - Publisher Copyright:
© The Association for Clinical and Translational Science 2020.
PY - 2021/5
Y1 - 2021/5
N2 - Machine learning (ML) provides the ability to examine massive datasets and uncover patterns within data without relying on a priori assumptions such as specific variable associations, linearity in relationships, or prespecified statistical interactions. However, the application of ML to healthcare data has been met with mixed results, especially when using administrative datasets such as the electronic health record. The black box nature of many ML algorithms contributes to an erroneous assumption that these algorithms can overcome major data issues inherent in large administrative healthcare data. As with other research endeavors, good data and analytic design is crucial to ML-based studies. In this paper, we will provide an overview of common misconceptions for ML, the corresponding truths, and suggestions for incorporating these methods into healthcare research while maintaining a sound study design.
AB - Machine learning (ML) provides the ability to examine massive datasets and uncover patterns within data without relying on a priori assumptions such as specific variable associations, linearity in relationships, or prespecified statistical interactions. However, the application of ML to healthcare data has been met with mixed results, especially when using administrative datasets such as the electronic health record. The black box nature of many ML algorithms contributes to an erroneous assumption that these algorithms can overcome major data issues inherent in large administrative healthcare data. As with other research endeavors, good data and analytic design is crucial to ML-based studies. In this paper, we will provide an overview of common misconceptions for ML, the corresponding truths, and suggestions for incorporating these methods into healthcare research while maintaining a sound study design.
KW - Machine learning
KW - electronic health record
KW - healthcare research
KW - research methodology
KW - translational research
UR - http://www.scopus.com/inward/record.url?scp=85103373850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103373850&partnerID=8YFLogxK
U2 - 10.1017/cts.2020.513
DO - 10.1017/cts.2020.513
M3 - Article
C2 - 33948244
AN - SCOPUS:85103373850
SN - 2059-8661
VL - 5
JO - Journal of Clinical and Translational Science
JF - Journal of Clinical and Translational Science
IS - 1
M1 - e21
ER -