Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition
{"title":"Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition","authors":"Suman Dutta, Rajkumar U. Zunjare, Anirban Sil, Dwijesh Chandra Mishra, Alka Arora, Nisrita Gain, Gulab Chand, Rashmi Chhabra, Vignesh Muthusamy, Firoz Hossain","doi":"10.1007/s00726-023-03368-0","DOIUrl":null,"url":null,"abstract":"<div><p>The mutant <i>matrilineal</i> (<i>mtl</i>) gene encoding patatin-like phospholipase activity is involved in <i>in-vivo</i> maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6–7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for <i>in-vivo</i> haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of <i>Zea mays</i> were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained > 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in <i>in-vivo</i> haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.</p></div>","PeriodicalId":7810,"journal":{"name":"Amino Acids","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00726-023-03368-0.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Amino Acids","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s00726-023-03368-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The mutant matrilineal (mtl) gene encoding patatin-like phospholipase activity is involved in in-vivo maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6–7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for in-vivo haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of Zea mays were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained > 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in in-vivo haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.
期刊介绍:
Amino Acids publishes contributions from all fields of amino acid and protein research: analysis, separation, synthesis, biosynthesis, cross linking amino acids, racemization/enantiomers, modification of amino acids as phosphorylation, methylation, acetylation, glycosylation and nonenzymatic glycosylation, new roles for amino acids in physiology and pathophysiology, biology, amino acid analogues and derivatives, polyamines, radiated amino acids, peptides, stable isotopes and isotopes of amino acids. Applications in medicine, food chemistry, nutrition, gastroenterology, nephrology, neurochemistry, pharmacology, excitatory amino acids are just some of the topics covered. Fields of interest include: Biochemistry, food chemistry, nutrition, neurology, psychiatry, pharmacology, nephrology, gastroenterology, microbiology