{"title":"A Finetuning Deep Learning Framework for Pan-species Promoters with Pseudo Time Series Analysis on Time and Frequency Space.","authors":"Ruimeng Li, Qinke Peng, Haozhou Li, Wentong Sun","doi":"10.1109/JBHI.2025.3568145","DOIUrl":null,"url":null,"abstract":"<p><p>Promoter identification and classification play crucial roles in unraveling gene mechanisms. Promoters are characterized by specific motifs, such as the TATA-box for eukaryotes and the Pribnow box for prokaryotes, which are known as elements. These constitute the core components, intimately tied to promoter function. However, the heterogeneity of promoters across different species poses a significant challenge to improving identification models. In our study, we introduce ProTriCNN, a deep learning method designed for promoter identification. Based on promoters representation, ProTriCNN treats promoters as pseudo-time series, utilizing this approach to capture the intricate heterogeneity of promoter elements. Furthermore, we introduce TransPro, a ProTriCNN-based Fine-tuning framework to improve identification performance across different species. To better align source species and target species, the TransPro utilizes elements and species evolutionary trees to represent the locality difference between source and target species across various levels and time-frequency space, respectively. Compared to state-of-the-art methods, ProTriCNN demonstrates superior performance across all species, achieving an average accuracy improvement of 2.1% and a 20% enhancement in the Matthews coefficient. TransPro further attains accuracy improvement of the highest 8% and a 25% enhancement in the Matthews coefficient compared to ProTriCNN. The source code and the associated datasets are freely available at https://github.com/Limomo33/promoter.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3568145","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Promoter identification and classification play crucial roles in unraveling gene mechanisms. Promoters are characterized by specific motifs, such as the TATA-box for eukaryotes and the Pribnow box for prokaryotes, which are known as elements. These constitute the core components, intimately tied to promoter function. However, the heterogeneity of promoters across different species poses a significant challenge to improving identification models. In our study, we introduce ProTriCNN, a deep learning method designed for promoter identification. Based on promoters representation, ProTriCNN treats promoters as pseudo-time series, utilizing this approach to capture the intricate heterogeneity of promoter elements. Furthermore, we introduce TransPro, a ProTriCNN-based Fine-tuning framework to improve identification performance across different species. To better align source species and target species, the TransPro utilizes elements and species evolutionary trees to represent the locality difference between source and target species across various levels and time-frequency space, respectively. Compared to state-of-the-art methods, ProTriCNN demonstrates superior performance across all species, achieving an average accuracy improvement of 2.1% and a 20% enhancement in the Matthews coefficient. TransPro further attains accuracy improvement of the highest 8% and a 25% enhancement in the Matthews coefficient compared to ProTriCNN. The source code and the associated datasets are freely available at https://github.com/Limomo33/promoter.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.