Trinh-trung-duong Nguyen, Syun Chen, Quang-Thai Ho, Yu-Yen Ou
{"title":"Using multiple convolutional window scanning of convolutional neural network for an efficient prediction of ATP‐binding sites in transport proteins","authors":"Trinh-trung-duong Nguyen, Syun Chen, Quang-Thai Ho, Yu-Yen Ou","doi":"10.1002/prot.26329","DOIUrl":null,"url":null,"abstract":"Protein multiple sequence alignment information has long been important features to know about functions of proteins inferred from related sequences with known functions. It is therefore one of the underlying ideas of Alpha fold 2, a breakthrough study and model for the prediction of three‐dimensional structures of proteins from their primary sequence. Our study used protein multiple sequence alignment information in the form of position‐specific scoring matrices as input. We also refined the use of a convolutional neural network, a well‐known deep‐learning architecture with impressive achievement on image and image‐like data. Specifically, we revisited the study of prediction of adenosine triphosphate (ATP)‐binding sites with more efficient convolutional neural networks. We applied multiple convolutional window scanning filters of a convolutional neural network on position‐specific scoring matrices for as much as useful information as possible. Furthermore, only the most specific motifs are retained at each feature map output through the one‐max pooling layer before going to the next layer. We assumed that this way could help us retain the most conserved motifs which are discriminative information for prediction. Our experiment results show that a convolutional neural network with not too many convolutional layers can be enough to extract the conserved information of proteins, which leads to higher performance. Our best prediction models were obtained after examining them with different hyper‐parameters. Our experiment results showed that our models were superior to traditional use of convolutional neural networks on the same datasets as well as other machine‐learning classification algorithms.","PeriodicalId":20789,"journal":{"name":"Proteins: Structure","volume":"7 1","pages":"1486 - 1492"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins: Structure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/prot.26329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Protein multiple sequence alignment information has long been important features to know about functions of proteins inferred from related sequences with known functions. It is therefore one of the underlying ideas of Alpha fold 2, a breakthrough study and model for the prediction of three‐dimensional structures of proteins from their primary sequence. Our study used protein multiple sequence alignment information in the form of position‐specific scoring matrices as input. We also refined the use of a convolutional neural network, a well‐known deep‐learning architecture with impressive achievement on image and image‐like data. Specifically, we revisited the study of prediction of adenosine triphosphate (ATP)‐binding sites with more efficient convolutional neural networks. We applied multiple convolutional window scanning filters of a convolutional neural network on position‐specific scoring matrices for as much as useful information as possible. Furthermore, only the most specific motifs are retained at each feature map output through the one‐max pooling layer before going to the next layer. We assumed that this way could help us retain the most conserved motifs which are discriminative information for prediction. Our experiment results show that a convolutional neural network with not too many convolutional layers can be enough to extract the conserved information of proteins, which leads to higher performance. Our best prediction models were obtained after examining them with different hyper‐parameters. Our experiment results showed that our models were superior to traditional use of convolutional neural networks on the same datasets as well as other machine‐learning classification algorithms.