Bi-SeqCNN: A Novel Light-Weight Bi-Directional CNN Architecture for Protein Function Prediction

IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Vikash Kumar;Akshay Deepak;Ashish Ranjan;Aravind Prakash
{"title":"Bi-SeqCNN: A Novel Light-Weight Bi-Directional CNN Architecture for Protein Function Prediction","authors":"Vikash Kumar;Akshay Deepak;Ashish Ranjan;Aravind Prakash","doi":"10.1109/TCBB.2024.3426491","DOIUrl":null,"url":null,"abstract":"Deep learning approaches, such as convolution neural networks (CNNs) and deep recurrent neural networks (RNNs), have been the backbone for predicting protein function, with promising state-of-the-art (SOTA) results. RNNs with an in-built ability (i) focus on past information, (ii) collect both \n<i>short-and-long</i>\n range dependency information, and (iii) bi-directional processing offers a strong sequential processing mechanism. CNNs, however, are confined to focusing on \n<i>short-term</i>\n information from both the past and the future, although they offer parallelism. Therefore, a novel \n<i>bi-directional CNN</i>\n that strictly complies with the sequential processing mechanism of RNNs is introduced and is used for developing a protein function prediction framework, Bi-SeqCNN. This is a sub-sequence-based framework. Further, Bi-SeqCNN\n<inline-formula><tex-math>$^+$</tex-math></inline-formula>\n is an ensemble approach to better the prediction results. To our knowledge, this is the first time \n<i>bi-directional CNNs</i>\n are employed for general temporal data analysis and not just for protein sequences. The proposed architecture produces improvements up to +5.5% over contemporary SOTA methods on three benchmark protein sequence datasets. Moreover, it is substantially lighter and attain these results with (0.50–0.70 times) fewer parameters than the SOTA methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1922-1933"},"PeriodicalIF":3.6000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10595435/","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning approaches, such as convolution neural networks (CNNs) and deep recurrent neural networks (RNNs), have been the backbone for predicting protein function, with promising state-of-the-art (SOTA) results. RNNs with an in-built ability (i) focus on past information, (ii) collect both short-and-long range dependency information, and (iii) bi-directional processing offers a strong sequential processing mechanism. CNNs, however, are confined to focusing on short-term information from both the past and the future, although they offer parallelism. Therefore, a novel bi-directional CNN that strictly complies with the sequential processing mechanism of RNNs is introduced and is used for developing a protein function prediction framework, Bi-SeqCNN. This is a sub-sequence-based framework. Further, Bi-SeqCNN $^+$ is an ensemble approach to better the prediction results. To our knowledge, this is the first time bi-directional CNNs are employed for general temporal data analysis and not just for protein sequences. The proposed architecture produces improvements up to +5.5% over contemporary SOTA methods on three benchmark protein sequence datasets. Moreover, it is substantially lighter and attain these results with (0.50–0.70 times) fewer parameters than the SOTA methods.
Bi-SeqCNN:用于蛋白质功能预测的新型轻量级双向 CNN 架构
卷积神经网络(CNN)和深度递归神经网络(RNN)等深度学习方法已成为预测蛋白质功能的中坚力量,并取得了令人鼓舞的先进(SOTA)成果。RNN 具有以下内在能力:(i) 专注于过去的信息;(ii) 同时收集短程和长程依赖信息;(iii) 双向处理,提供了强大的顺序处理机制。而 CNN 虽然提供了并行性,却仅限于关注过去和未来的短期信息。因此,我们引入了一种严格遵守 RNN 顺序处理机制的新型双向 CNN,并将其用于开发蛋白质功能预测框架--Bi-SeqCNN。这是一个基于子序列的框架。此外,Bi-SeqCNN + 是一种集合方法,可以获得更好的预测结果。据我们所知,这是首次将双向 CNN 用于一般时间数据分析,而不仅仅是蛋白质序列。在三个基准蛋白质序列数据集上,所提出的架构比当代的 SOTA 方法提高了 5.5%。此外,与 SOTA 方法相比,它的重量更轻,只需要(0.50-0.70 倍)更少的参数就能获得这些结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
6.70%
发文量
479
审稿时长
3 months
期刊介绍: IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信