CoSEF-DBP: Convolution scope expanding fusion network for identifying DNA-binding proteins through bilingual representations

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Hua Zhang , Xiaoqi Yang , Pengliang Chen , Cheng Yang , Bi Chen , Bo Jiang , Guogen Shan
{"title":"CoSEF-DBP: Convolution scope expanding fusion network for identifying DNA-binding proteins through bilingual representations","authors":"Hua Zhang ,&nbsp;Xiaoqi Yang ,&nbsp;Pengliang Chen ,&nbsp;Cheng Yang ,&nbsp;Bi Chen ,&nbsp;Bo Jiang ,&nbsp;Guogen Shan","doi":"10.1016/j.eswa.2024.125763","DOIUrl":null,"url":null,"abstract":"<div><div>Precisely recognizing DNA-binding proteins (DBPs) from sequences is crucial for a profound comprehension of the mechanisms governing protein-DNA interactions in various cellular processes. However, traditional in-silico methods for DBP identification encounter several challenges, such as time-consuming evolutionary modeling based on multiple sequence alignments, and intricate feature engineering associated with machine or deep learning approaches. In this paper, we introduce a novel end-to-end predictor for identifying DNA-binding proteins without intricate feature engineering, which innovatively enriches the semantics of amino acid sequences through the fusion of bilingual representations derived from distinct language models. We further design a convolution scope expanding (CoSE) module to widen the receptive fields of convolution kernels, thereby forming protein-level CoSE representation sequences. These representations are subsequently integrated via BiLSTM in conjunction with a simplified capsule network, enhancing the hierarchical feature extraction capability. Extensive experiments confirm that our model surpasses existing baselines across diverse benchmark datasets, notably achieving at least a 5.1% improvement in MCC value on the UniSwiss dataset.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125763"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424026307","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Precisely recognizing DNA-binding proteins (DBPs) from sequences is crucial for a profound comprehension of the mechanisms governing protein-DNA interactions in various cellular processes. However, traditional in-silico methods for DBP identification encounter several challenges, such as time-consuming evolutionary modeling based on multiple sequence alignments, and intricate feature engineering associated with machine or deep learning approaches. In this paper, we introduce a novel end-to-end predictor for identifying DNA-binding proteins without intricate feature engineering, which innovatively enriches the semantics of amino acid sequences through the fusion of bilingual representations derived from distinct language models. We further design a convolution scope expanding (CoSE) module to widen the receptive fields of convolution kernels, thereby forming protein-level CoSE representation sequences. These representations are subsequently integrated via BiLSTM in conjunction with a simplified capsule network, enhancing the hierarchical feature extraction capability. Extensive experiments confirm that our model surpasses existing baselines across diverse benchmark datasets, notably achieving at least a 5.1% improvement in MCC value on the UniSwiss dataset.
CoSEF-DBP:通过双语表征识别 DNA 结合蛋白的卷积范围扩展融合网络
从序列中精确识别 DNA 结合蛋白(DBP)对于深入理解各种细胞过程中蛋白质-DNA 的相互作用机制至关重要。然而,传统的用于识别 DBP 的内测方法遇到了一些挑战,例如基于多序列比对的耗时进化建模,以及与机器或深度学习方法相关的复杂特征工程。在本文中,我们介绍了一种无需复杂特征工程就能识别 DNA 结合蛋白的新型端到端预测器,该预测器通过融合从不同语言模型中提取的双语表征,创新性地丰富了氨基酸序列的语义。我们进一步设计了一个卷积范围扩展(CoSE)模块,以拓宽卷积核的感受野,从而形成蛋白质级的 CoSE 表示序列。这些表示序列随后通过 BiLSTM 与简化的胶囊网络进行整合,从而增强了分层特征提取能力。广泛的实验证实,我们的模型在各种基准数据集上超越了现有的基线,尤其是在 UniSwiss 数据集上的 MCC 值至少提高了 5.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信