TY - JOUR
T1 - EnZymClass
T2 - Substrate specificity prediction tool of plant acyl-ACP thioesterases based on ensemble learning
AU - Banerjee, Deepro
AU - Jindra, Michael A.
AU - Linot, Alec J.
AU - Pfleger, Brian F.
AU - Maranas, Costas D.
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2022/1
Y1 - 2022/1
N2 - Characterizing the functional properties of plant acyl-ACP thioesterases (TEs), a key enzyme class used in the production of renewable oleochemicals in microbial hosts, experimentally, can be an expensive and time consuming process since it requires manual screening of thousands of candidates in a database. Using amino acid sequence to computationally predict an enzyme's function might accelerate this process; however obtaining the necessary amount of information on previously characterized enzymes and their respective sequences required by standard Machine Learning (ML) based approaches to accurately infer sequence-function relationships can be prohibitive, especially with a low-throughput testing cycle. Experimental noise, unbalanced dataset where high sequence similarity does not always imply identical functional properties will further prevent robust prediction performance. Herein we present a ML method, Ensemble method for enZyme Classification (EnZymClass), that is specifically designed to address these issues. We used EnZymClass to classify TEs into short, long and mixed free fatty acid substrate specificity categories. While general guidelines for inferring substrate specificity have been proposed before, prediction of chain-length preference from primary sequence has remained elusive for plant acyl-ACP TEs. By applying EnZymClass to a subset of TEs in the ThYme database, we identified two medium chain TEs, ClFatB3 and CwFatB2, with previously uncharacterized activity in E. coli fatty acid production hosts. EnZymClass can be readily applied to other protein classification challenges and is available at: https://github.com/deeprob/ThioesteraseEnzymeSpecificity.
AB - Characterizing the functional properties of plant acyl-ACP thioesterases (TEs), a key enzyme class used in the production of renewable oleochemicals in microbial hosts, experimentally, can be an expensive and time consuming process since it requires manual screening of thousands of candidates in a database. Using amino acid sequence to computationally predict an enzyme's function might accelerate this process; however obtaining the necessary amount of information on previously characterized enzymes and their respective sequences required by standard Machine Learning (ML) based approaches to accurately infer sequence-function relationships can be prohibitive, especially with a low-throughput testing cycle. Experimental noise, unbalanced dataset where high sequence similarity does not always imply identical functional properties will further prevent robust prediction performance. Herein we present a ML method, Ensemble method for enZyme Classification (EnZymClass), that is specifically designed to address these issues. We used EnZymClass to classify TEs into short, long and mixed free fatty acid substrate specificity categories. While general guidelines for inferring substrate specificity have been proposed before, prediction of chain-length preference from primary sequence has remained elusive for plant acyl-ACP TEs. By applying EnZymClass to a subset of TEs in the ThYme database, we identified two medium chain TEs, ClFatB3 and CwFatB2, with previously uncharacterized activity in E. coli fatty acid production hosts. EnZymClass can be readily applied to other protein classification challenges and is available at: https://github.com/deeprob/ThioesteraseEnzymeSpecificity.
UR - http://www.scopus.com/inward/record.url?scp=85121655884&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121655884&partnerID=8YFLogxK
U2 - 10.1016/j.crbiot.2021.12.002
DO - 10.1016/j.crbiot.2021.12.002
M3 - Article
AN - SCOPUS:85121655884
SN - 2590-2628
VL - 4
SP - 1
EP - 9
JO - Current Research in Biotechnology
JF - Current Research in Biotechnology
ER -