We present a general probabilistic framework for predicting the substrate specificity of enzymes. We designed this approach to be easily applicable to different organisms and enzymes. Therefore, our predictive models do not rely on species-specific properties and use mostly sequence-derived data. Maximum Likelihood optimization is used to fine-tune model parameters and the Akaike Information Criterion is employed to overcome the issue of correlated variables. As a proof-of-principle, we apply our approach to predicting general substrate specificity of yeast methyltransferases (MTases). As input, we use several physico-chemical and biological properties of MTases: structural fold, isoelectric point, expression pattern and cellular localization. Our method accurately predicts whether a yeast MTase methylates a protein, RNA or another molecule. Among our experimentally tested predictions, 89% were confirmed, including the surprising prediction that YOR021C is the first known MTase with a SPOUT fold that methylates a substrate other than RNA (protein). Our approach not only allows for highly accurate prediction of functional specificity of MTases, but also provides insight into general rules governing MTase substrate specificity.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Modeling and Simulation
- Ecology, Evolution, Behavior and Systematics
- Molecular Biology
- Cellular and Molecular Neuroscience