TY - GEN
T1 - Scalable and Reproducible Virtual Screening through an API-Integrated Workflow
AU - Huff, Tiffany
AU - Darrow, Austin
AU - Medina, Joshua
AU - Ferlanti, Erik
AU - Carson, James
AU - Fonner, John
AU - Tijerina, Sal
AU - Watowich, Stanley J.
AU - Allen, William J.
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/7/23
Y1 - 2023/7/23
N2 - Virtual screening is a key step of the drug discovery process which utilizes computational resources to simulate the behavior of small molecules in the binding site of a target protein. [13] Researchers often test millions of molecules when searching for an early hit compound, requiring significant CPU hours. An accessible, convenient, fast, and computationally efficient means for virtual screening is desirable in order for researchers to conserve resources in the early phase of drug discovery. We developed an application programming interface (API) integrated workflow that allows researchers to submit virtual screening batch jobs to the Lonestar6 supercomputer through a web portal. The containerized [7] workflow employs parallelized Python scripting using mpi4py [2] to efficiently distribute molecular docking tasks performed by AutoDock Vina. [3] The Texas Advanced Computing Center (TACC) API (TAPIS) framework [16], a REST API framework for research computing, was used to integrate the workflow into the University of Texas System Research Cyberinfrastructure (UTRC) web portal. [12] Five large libraries representing commercially-available small molecules or fragments were prepared and are available for screening. Here, we discuss our experience developing this service, as well as the results of extensive internal benchmarks to determine the most efficient parallelization scheme to employ for each molecule library when submitting batch jobs. Regardless of the chosen ligand library, the core, node, and parallel task specifications allow the user to run a virtual drug screening and receive their resulting top docking scores in 24 hours. The service is available to registered academic users, and more information can be found at the Drug Discovery at TACC website.
AB - Virtual screening is a key step of the drug discovery process which utilizes computational resources to simulate the behavior of small molecules in the binding site of a target protein. [13] Researchers often test millions of molecules when searching for an early hit compound, requiring significant CPU hours. An accessible, convenient, fast, and computationally efficient means for virtual screening is desirable in order for researchers to conserve resources in the early phase of drug discovery. We developed an application programming interface (API) integrated workflow that allows researchers to submit virtual screening batch jobs to the Lonestar6 supercomputer through a web portal. The containerized [7] workflow employs parallelized Python scripting using mpi4py [2] to efficiently distribute molecular docking tasks performed by AutoDock Vina. [3] The Texas Advanced Computing Center (TACC) API (TAPIS) framework [16], a REST API framework for research computing, was used to integrate the workflow into the University of Texas System Research Cyberinfrastructure (UTRC) web portal. [12] Five large libraries representing commercially-available small molecules or fragments were prepared and are available for screening. Here, we discuss our experience developing this service, as well as the results of extensive internal benchmarks to determine the most efficient parallelization scheme to employ for each molecule library when submitting batch jobs. Regardless of the chosen ligand library, the core, node, and parallel task specifications allow the user to run a virtual drug screening and receive their resulting top docking scores in 24 hours. The service is available to registered academic users, and more information can be found at the Drug Discovery at TACC website.
KW - API
KW - Apptainer
KW - Bioinformatics
KW - Containers
KW - Datasets
KW - High-performance computing
KW - MPI
KW - Molecular Docking
KW - Visualization
UR - http://www.scopus.com/inward/record.url?scp=85176231203&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85176231203&partnerID=8YFLogxK
U2 - 10.1145/3569951.3597599
DO - 10.1145/3569951.3597599
M3 - Conference contribution
AN - SCOPUS:85176231203
T3 - PEARC 2023 - Computing for the common good: Practice and Experience in Advanced Research Computing
SP - 196
EP - 199
BT - PEARC 2023 - Computing for the common good
PB - Association for Computing Machinery, Inc
T2 - 2023 Practice and Experience in Advanced Research Computing, PEARC 2023
Y2 - 23 July 2023 through 27 July 2023
ER -