A novel system for predicting plant protein kinase superfamily by using machine learning methodology

Q2 Medicine

In Silico Biology Pub Date : 2010-02-15 DOI:10.1145/1722024.1722064

V. Mallika, K. Sivakumar, E. Soniya

{"title":"A novel system for predicting plant protein kinase superfamily by using machine learning methodology","authors":"V. Mallika, K. Sivakumar, E. Soniya","doi":"10.1145/1722024.1722064","DOIUrl":null,"url":null,"abstract":"Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"34"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722064","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"In Silico Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1722024.1722064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.

查看原文本刊更多论文

利用机器学习方法预测植物蛋白激酶超家族的新系统

蛋白激酶是最大的蛋白质超家族之一，几乎参与了每一个细胞过程。在植物中，由于其在细胞通讯，生长和发育中的重要作用，对这种特殊蛋白质的研究越来越多。开发一种工具来确定序列是植物蛋白激酶的可能性，将简化工作并加快实验表征。在这种方法中，已经开发并实现了一个高性能预测服务器“PhytokinaseSVM”，可以在http://type3pks.in/kinase上获得。在开发过程中，利用了支持向量机、基于核的监督学习技术以及二肽和多重频率的组成特性。基于有限的可用数据，该工具提供了一个简单独特的平台来确定特定序列是否是植物蛋白激酶的概率，准确度中等(98%)。当对500种蛋白激酶和500种非蛋白激酶进行测试时，PhytokinaseSVM达到了96%的特异性和100%的灵敏度，而这些蛋白激酶不是训练数据集的一部分。我们期望该工具可以作为植物蛋白激酶研究人员的有用资源，因为它是免费的。该工具还可以预测其他真核蛋白激酶。目前正在进行的工作是通过在训练数据集中包含更多的序列特征来进一步提高预测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

In Silico Biology Computer Science-Computational Theory and Mathematics

CiteScore

2.20

自引率

0.00%

发文量

期刊介绍： The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology. Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious.