DHUpredET: A comparative computational approach for identification of dihydrouridine modification sites in RNA sequence

IF 2.6 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Md Fahim Sultan , Tasmin Karim , Md Shazzad Hossain Shaon , Sayed Mehedi Azim , Iman Dehzangi , Mst Shapna Akter , Sobhy M. Ibrahim , Md Mamun Ali , Kawsar Ahmed , Francis M. Bui
{"title":"DHUpredET: A comparative computational approach for identification of dihydrouridine modification sites in RNA sequence","authors":"Md Fahim Sultan ,&nbsp;Tasmin Karim ,&nbsp;Md Shazzad Hossain Shaon ,&nbsp;Sayed Mehedi Azim ,&nbsp;Iman Dehzangi ,&nbsp;Mst Shapna Akter ,&nbsp;Sobhy M. Ibrahim ,&nbsp;Md Mamun Ali ,&nbsp;Kawsar Ahmed ,&nbsp;Francis M. Bui","doi":"10.1016/j.ab.2025.115828","DOIUrl":null,"url":null,"abstract":"<div><div>Laboratory-based detection of D sites is laborious and expensive. In this study, we developed effective machine learning models employing efficient feature encoding methods to identify D sites. Initially, we explored various state-of-the-art feature encoding approaches and 30 machine learning techniques for each and selected the top eight models based on their independent testing and cross-validation outcomes. Finally, we developed DHUpredET using the extra tree classifier methods for predicting DHU sites. The DHUpredET model demonstrated balanced performance across all evaluation criteria, outperforming state-of-the-art models by 8 % and 14 % in terms of accuracy and sensitivity, respectively, on an independent test set. Further analysis revealed that the model achieved higher accuracy with position-specific two nucleotide (PS2) features, leading us to conclude that PS2 features are the best suited for the DHUpredET model. Therefore, our proposed model emerges as the most favorite choice for predicting D sites. In addition, we conducted an in-depth analysis of local features and identified a particularly significant attribute with a feature score of 0.035 for PS2_299 attributes. This tool holds immense promise as an advantageous instrument for accelerating the discovery of D modification sites, which contributes too many targeting therapeutic and understanding RNA structure.</div></div>","PeriodicalId":7830,"journal":{"name":"Analytical biochemistry","volume":"702 ","pages":"Article 115828"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003269725000661","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Laboratory-based detection of D sites is laborious and expensive. In this study, we developed effective machine learning models employing efficient feature encoding methods to identify D sites. Initially, we explored various state-of-the-art feature encoding approaches and 30 machine learning techniques for each and selected the top eight models based on their independent testing and cross-validation outcomes. Finally, we developed DHUpredET using the extra tree classifier methods for predicting DHU sites. The DHUpredET model demonstrated balanced performance across all evaluation criteria, outperforming state-of-the-art models by 8 % and 14 % in terms of accuracy and sensitivity, respectively, on an independent test set. Further analysis revealed that the model achieved higher accuracy with position-specific two nucleotide (PS2) features, leading us to conclude that PS2 features are the best suited for the DHUpredET model. Therefore, our proposed model emerges as the most favorite choice for predicting D sites. In addition, we conducted an in-depth analysis of local features and identified a particularly significant attribute with a feature score of 0.035 for PS2_299 attributes. This tool holds immense promise as an advantageous instrument for accelerating the discovery of D modification sites, which contributes too many targeting therapeutic and understanding RNA structure.

Abstract Image

基于实验室的 D 位点检测既费力又昂贵。在这项研究中,我们开发了有效的机器学习模型,采用高效的特征编码方法来识别 D 位点。最初,我们探索了各种最先进的特征编码方法和 30 种机器学习技术,并根据其独立测试和交叉验证结果选出了前 8 个模型。最后,我们利用额外树分类器方法开发了 DHUpredET,用于预测 DHU 位点。DHUpredET 模型在所有评估标准上都表现出了均衡的性能,在独立测试集上的准确率和灵敏度分别比最先进的模型高出 8% 和 14%。进一步的分析表明,该模型在使用位置特异性双核苷酸(PS2)特征时取得了更高的准确率,因此我们得出结论:PS2 特征最适合 DHUpredET 模型。因此,我们提出的模型是预测 D 位点的最佳选择。此外,我们还对局部特征进行了深入分析,发现了一个特别重要的属性,其 PS2_299 属性的特征得分为 0.035。该工具有望成为加速发现 D 修饰位点的有利工具,为靶向治疗和理解 RNA 结构做出巨大贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Analytical biochemistry
Analytical biochemistry 生物-分析化学
CiteScore
5.70
自引率
0.00%
发文量
283
审稿时长
44 days
期刊介绍: The journal''s title Analytical Biochemistry: Methods in the Biological Sciences declares its broad scope: methods for the basic biological sciences that include biochemistry, molecular genetics, cell biology, proteomics, immunology, bioinformatics and wherever the frontiers of research take the field. The emphasis is on methods from the strictly analytical to the more preparative that would include novel approaches to protein purification as well as improvements in cell and organ culture. The actual techniques are equally inclusive ranging from aptamers to zymology. The journal has been particularly active in: -Analytical techniques for biological molecules- Aptamer selection and utilization- Biosensors- Chromatography- Cloning, sequencing and mutagenesis- Electrochemical methods- Electrophoresis- Enzyme characterization methods- Immunological approaches- Mass spectrometry of proteins and nucleic acids- Metabolomics- Nano level techniques- Optical spectroscopy in all its forms. The journal is reluctant to include most drug and strictly clinical studies as there are more suitable publication platforms for these types of papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信