A Useful Tool for the Identification of DNA-binding Proteins Using Graph Convolutional Network

IF 0.5 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Current Proteomics Pub Date : 2020-12-10 DOI:10.2174/1570164618999201210225354

Dasheng Chen, Leyi Wei

{"title":"A Useful Tool for the Identification of DNA-binding Proteins Using Graph Convolutional Network","authors":"Dasheng Chen, Leyi Wei","doi":"10.2174/1570164618999201210225354","DOIUrl":null,"url":null,"abstract":"\n\nBoth DNAs and proteins are important components of living organisms. DNA-binding proteins are\na kind of helicase, which is a protein specifically responsible for binding to DNA single stranded regions. It plays a key role\nin the function of various biomolecules. Although there are some prediction methods for the DNA-binding proteins sequences,\nthe use of graph neural networks in this research is still limited.\n\n\n\nIn this article, using graph neural networks, we developed a novel predictor GCN-DBP for protein classification\nprediction.\n\n\n\nEach protein sequence is treated as a document in this study, and then document is segmented according to the\nconcept of k-mer. This research aims to use document word relationships and word co-occurrence as a corpus to construct a\ntext graph. Then, the predictor learns protein sequence information by two-layer graph convolutional networks.\n\n\n\nIn order to compare the proposed method with other four existing methods, we have conducted more experiments.\nFinally, we tested GCN-DBP on the independent data set PDB2272. Its accuracy reached 64.17% and MCC reached\n28.32%.\n\n\n\nThe results show that the proposed method is superior to the other four methods and will be a useful tool for\nprotein classification.\n","PeriodicalId":50601,"journal":{"name":"Current Proteomics","volume":"11 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2020-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1570164618999201210225354","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 2

Abstract

Both DNAs and proteins are important components of living organisms. DNA-binding proteins are a kind of helicase, which is a protein specifically responsible for binding to DNA single stranded regions. It plays a key role in the function of various biomolecules. Although there are some prediction methods for the DNA-binding proteins sequences, the use of graph neural networks in this research is still limited. In this article, using graph neural networks, we developed a novel predictor GCN-DBP for protein classification prediction. Each protein sequence is treated as a document in this study, and then document is segmented according to the concept of k-mer. This research aims to use document word relationships and word co-occurrence as a corpus to construct a text graph. Then, the predictor learns protein sequence information by two-layer graph convolutional networks. In order to compare the proposed method with other four existing methods, we have conducted more experiments. Finally, we tested GCN-DBP on the independent data set PDB2272. Its accuracy reached 64.17% and MCC reached 28.32%. The results show that the proposed method is superior to the other four methods and will be a useful tool for protein classification.

Abstract Image

查看原文本刊更多论文

使用图卷积网络识别dna结合蛋白的有用工具

dna和蛋白质都是生物体的重要组成部分。DNA结合蛋白区域的一种解旋酶，它是一种专门负责结合DNA单链区域的蛋白质。它在各种生物分子的功能中起着关键作用。虽然已有一些dna结合蛋白序列的预测方法，但图神经网络在该研究中的应用仍然有限。在本文中，我们利用图神经网络，开发了一种新的预测器GCN-DBP用于蛋白质分类预测。本研究将每个蛋白序列作为一个文档，然后根据k-mer的概念对文档进行分割。本研究旨在以文档词关系和词共现为语料库，构建文本图。然后，预测器通过两层图卷积网络学习蛋白质序列信息。为了与其他四种现有方法进行比较，我们进行了更多的实验。最后，我们在独立数据集PDB2272上测试了GCN-DBP。准确度达64.17%，MCC达28.32%。结果表明，该方法优于其他四种方法，将成为一种有用的蛋白质分类工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Proteomics BIOCHEMICAL RESEARCH METHODS-BIOCHEMISTRY & MOLECULAR BIOLOGY

CiteScore

1.60

自引率

0.00%

发文量

审稿时长

>0 weeks

期刊介绍： Research in the emerging field of proteomics is growing at an extremely rapid rate. The principal aim of Current Proteomics is to publish well-timed in-depth/mini review articles in this fast-expanding area on topics relevant and significant to the development of proteomics. Current Proteomics is an essential journal for everyone involved in proteomics and related fields in both academia and industry. Current Proteomics publishes in-depth/mini review articles in all aspects of the fast-expanding field of proteomics. All areas of proteomics are covered together with the methodology, software, databases, technological advances and applications of proteomics, including functional proteomics. Diverse technologies covered include but are not limited to: Protein separation and characterization techniques 2-D gel electrophoresis and image analysis Techniques for protein expression profiling including mass spectrometry-based methods and algorithms for correlative database searching Determination of co-translational and post- translational modification of proteins Protein/peptide microarrays Biomolecular interaction analysis Analysis of protein complexes Yeast two-hybrid projects Protein-protein interaction (protein interactome) pathways and cell signaling networks Systems biology Proteome informatics (bioinformatics) Knowledge integration and management tools High-throughput protein structural studies (using mass spectrometry, nuclear magnetic resonance and X-ray crystallography) High-throughput computational methods for protein 3-D structure as well as function determination Robotics, nanotechnology, and microfluidics.