@article{838907b3186e4478a9f45bdf009f94e0,
title = "Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmes",
abstract = "Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. Funding: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.",
keywords = "COVID-19, Human Phenotype Ontology, Long COVID, Machine learning, Precision medicine, Semantic similarity",
author = "{N3C Consortium} and {RECOVER Consortium} and Reese, {Justin T.} and Hannah Blau and Elena Casiraghi and Timothy Bergquist and Loomba, {Johanna J.} and Callahan, {Tiffany J.} and Bryan Laraway and Corneliu Antonescu and Ben Coleman and Michael Gargano and Wilkins, {Kenneth J.} and Luca Cappelletti and Tommaso Fontana and Nariman Ammar and Blessy Antony and Murali, {T. M.} and Caufield, {J. Harry} and Guy Karlebach and McMurry, {Julie A.} and Andrew Williams and Richard Moffitt and Jineta Banerjee and Solomonides, {Anthony E.} and Hannah Davis and Kristin Kostka and Giorgio Valentini and David Sahner and Chute, {Christopher G.} and Charisse Madlock-Brown and Haendel, {Melissa A.} and Robinson, {Peter N.} and Heidi Spratt and Shyam Visweswaran and Flack, {Joseph Eugene} and Yoo, {Yun Jae} and Davera Gabriel and Alexander, {G. Caleb} and Mehta, {Hemalkumar B.} and Feifan Liu and Miller, {Robert T.} and Rachel Wong and Hill, {Elaine L.} and Thorpe, {Lorna E.} and Jasmin Divers",
note = "Funding Information: The authors acknowledge the following funding sources: National Institutes of Health grant CD2H NCATS U24 TR002306 (J.T.R., C.C., H.B., N.A., B.L., K.K., M.A.H., P.N.R.). National Institutes of Health grant NHLBI RECOVER Agreement OT2HL161847-01 (J.T.R., K.K., B.L., M.A.H., P.N.R.). National Institutes of Health grant Office of the Director Monarch Initiative R24 OD011883 (M.A.H., P.N.R.). National Institutes of Health grant NHGRI Center of Excellence in Genome Sciences RM1 HG010860 (M.A.H., P.N.R.). National Institutes of Health grant NCATS UL1TR003015 (B.A., T.M.M.). National Institutes of Health grant NCATS KL2TR003016 (B.A., T.M.M.). Director, Office of Science , Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231 (J.T.R.). Donald A. Roux Family Fund at the Jackson Laboratory (P.N.R.). Marsico Family at the University of Colorado Anschutz (M.A.H.). K. Wilkins is an employee of NIH. D. Sahner is a contractor to NIH through Axle Informatics. This study is part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative ( https://recovercovid.org/ ), which seeks to understand, treat, and prevent the post-acute sequelae of SARS-CoV-2 infection (PASC) and; and was conducted under the N3C DUR RP-5677B5. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH. Medical Authorship was determined using ICMJE recommendations. The analyses described in this publication were conducted with data or tools accessed through the NCATS N3C Data Enclave covid.cd2h.org/enclave and supported by NCATS U24 TR002306 . This research was possible because of the patients whose information is included within the data from participating organisations (covid.cd2h.org/dtas) and the organisations and scientists (covid.cd2h.org/duas) who have contributed to the on-going development of this community resource. 58 The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol # IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources . We gratefully acknowledge the following core contributors to N3C: Anita Walden, Leonie Misquitta, Joni L. Rutter, Kenneth R. Gersing, Penny Wung Burgoon, Samuel Bozzette, Mariam Deacy, Christopher Dillon, Rebecca Erwin-Cohen, Nicole Garbarini, Valery Gordon, Michael G. Kurilla, Emily Carlson Marti, Sam G. Michael, Lili Portilla, Clare Schmitt, Meredith Temple-O'Connor, David A. Eichmann, Warren A. Kibbe, Hongfang Liu, Philip R.O. Payne, Emily R. Pfaff, Peter N. Robinson, Joel H. Saltz, Heidi Spratt, Justin Starren, Christine Suver, Adam B. Wilcox, Andrew E. Williams, Chunlei Wu, Davera Gabriel, Stephanie S. Hong, Kristin Kostka, Harold P. Lehmann, Michele Morris, Matvey B. Palchuk, Xiaohan Tanner Zhang, Richard L. Zhu, Benjamin Amor, Mark M. Bissell, Marshall Clark, Andrew T. Girvin, Stephanie S. Hong, Kristin Kostka, Adam M. Lee, Robert T. Miller, Michele Morris, Matvey B. Palchuk, Kellie M. Walters, Will Cooper, Patricia A. Francis, Rafael Fuentes, Alexis Graves, Julie A. McMurry, Shawn T. O'Neil, Usman Sheikh, Elizabeth Zampino, Katie Rebecca Bradwell, Andrew T. Girvin, Amin Manna, Nabeel Qureshi, Christine Suver, Julie A. McMurry, Carolyn Bramante, Jeremy Richard Harper, Wenndy Hernandez, Farrukh M Koraishy, Amit Saha, Satyanarayana Vedula, Johanna Loomba, Andrea Zhou, Steve Johnson, Evan French, Alfred (Jerrod) Anzalone, Umit Topaloglu, Amy Olex, Hythem Sidkey. Details of contributions available at covid.cd2h.org/acknowledgements. Funding Information: We acknowledge support from many grants; the content is solely the responsibility of the authors and does not necessarily represent the official views of the N3C Program, the NIH or other funders. In addition, access to N3C Data Enclave resources does not imply endorsement of the research project and/or results by NIH or NCATS. Publisher Copyright: {\textcopyright} 2022 The Authors",
year = "2023",
month = jan,
doi = "10.1016/j.ebiom.2022.104413",
language = "English (US)",
volume = "87",
journal = "EBioMedicine",
issn = "2352-3964",
publisher = "Elsevier BV",
}