Hua Zhang , Xiaoqi Yang , Pengliang Chen , Cheng Yang , Bi Chen , Bo Jiang , Guogen Shan
{"title":"CoSEF-DBP: Convolution scope expanding fusion network for identifying DNA-binding proteins through bilingual representations","authors":"Hua Zhang , Xiaoqi Yang , Pengliang Chen , Cheng Yang , Bi Chen , Bo Jiang , Guogen Shan","doi":"10.1016/j.eswa.2024.125763","DOIUrl":null,"url":null,"abstract":"<div><div>Precisely recognizing DNA-binding proteins (DBPs) from sequences is crucial for a profound comprehension of the mechanisms governing protein-DNA interactions in various cellular processes. However, traditional in-silico methods for DBP identification encounter several challenges, such as time-consuming evolutionary modeling based on multiple sequence alignments, and intricate feature engineering associated with machine or deep learning approaches. In this paper, we introduce a novel end-to-end predictor for identifying DNA-binding proteins without intricate feature engineering, which innovatively enriches the semantics of amino acid sequences through the fusion of bilingual representations derived from distinct language models. We further design a convolution scope expanding (CoSE) module to widen the receptive fields of convolution kernels, thereby forming protein-level CoSE representation sequences. These representations are subsequently integrated via BiLSTM in conjunction with a simplified capsule network, enhancing the hierarchical feature extraction capability. Extensive experiments confirm that our model surpasses existing baselines across diverse benchmark datasets, notably achieving at least a 5.1% improvement in MCC value on the UniSwiss dataset.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125763"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424026307","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Precisely recognizing DNA-binding proteins (DBPs) from sequences is crucial for a profound comprehension of the mechanisms governing protein-DNA interactions in various cellular processes. However, traditional in-silico methods for DBP identification encounter several challenges, such as time-consuming evolutionary modeling based on multiple sequence alignments, and intricate feature engineering associated with machine or deep learning approaches. In this paper, we introduce a novel end-to-end predictor for identifying DNA-binding proteins without intricate feature engineering, which innovatively enriches the semantics of amino acid sequences through the fusion of bilingual representations derived from distinct language models. We further design a convolution scope expanding (CoSE) module to widen the receptive fields of convolution kernels, thereby forming protein-level CoSE representation sequences. These representations are subsequently integrated via BiLSTM in conjunction with a simplified capsule network, enhancing the hierarchical feature extraction capability. Extensive experiments confirm that our model surpasses existing baselines across diverse benchmark datasets, notably achieving at least a 5.1% improvement in MCC value on the UniSwiss dataset.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.