全面回顾和评估用于预测 N6-甲基腺苷位点的计算方法

IF 3.6 3区 生物学 Q1 BIOLOGY
Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu
{"title":"全面回顾和评估用于预测 N6-甲基腺苷位点的计算方法","authors":"Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu","doi":"10.3390/biology13100777","DOIUrl":null,"url":null,"abstract":"<p><p>N6-methyladenosine (m<sup>6</sup>A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m<sup>6</sup>A mapping have accelerated the accumulation of m<sup>6</sup>A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m<sup>6</sup>A site prediction. However, it is still a major challenge to precisely predict m<sup>6</sup>A sites using in silico approaches. To advance the computational support for m<sup>6</sup>A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., <i>H. sapiens</i>, <i>M. musculus</i>, <i>Rat</i>, <i>S. cerevisiae</i>, <i>Zebrafish</i>, <i>A. thaliana</i>, <i>Pig</i>, <i>Rhesus</i>, and <i>Chimpanzee</i>). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m<sup>6</sup>A modification. We revisited 52 computational approaches published since 2015 for m<sup>6</sup>A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m<sup>6</sup>A identification and facilitate more rigorous comparisons of new methods in the future.</p>","PeriodicalId":48624,"journal":{"name":"Biology-Basel","volume":"13 10","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11504118/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites.\",\"authors\":\"Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu\",\"doi\":\"10.3390/biology13100777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>N6-methyladenosine (m<sup>6</sup>A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m<sup>6</sup>A mapping have accelerated the accumulation of m<sup>6</sup>A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m<sup>6</sup>A site prediction. However, it is still a major challenge to precisely predict m<sup>6</sup>A sites using in silico approaches. To advance the computational support for m<sup>6</sup>A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., <i>H. sapiens</i>, <i>M. musculus</i>, <i>Rat</i>, <i>S. cerevisiae</i>, <i>Zebrafish</i>, <i>A. thaliana</i>, <i>Pig</i>, <i>Rhesus</i>, and <i>Chimpanzee</i>). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m<sup>6</sup>A modification. We revisited 52 computational approaches published since 2015 for m<sup>6</sup>A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m<sup>6</sup>A identification and facilitate more rigorous comparisons of new methods in the future.</p>\",\"PeriodicalId\":48624,\"journal\":{\"name\":\"Biology-Basel\",\"volume\":\"13 10\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11504118/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biology-Basel\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/biology13100777\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology-Basel","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biology13100777","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

N6-甲基腺苷(m6A)在控制细胞功能和基因表达方面起着至关重要的调控作用。最近,用于全转录组 m6A 图谱的测序技术取得了进展,加速了单核苷酸水平 m6A 位点信息的积累,为开发 m6A 位点预测的计算方法提供了更多高置信度的训练数据。然而,利用硅学方法精确预测 m6A 位点仍是一项重大挑战。为了推进对 m6A 位点鉴定的计算支持,我们在此从 9 个不同物种(即智人、麝香猫、大鼠、S. cerevisiae、斑马鱼、A. thaliana、猪、恒河猴和黑猩猩)中收集了 13 个最新的基准数据集。这将有助于研究界对替代方法进行无偏见的评估,并支持未来对 m6A 修饰的研究。我们重新研究了 2015 年以来发表的 52 种 m6A 位点鉴定计算方法,包括 30 种基于传统机器学习的方法、14 种基于深度学习的方法和 8 种基于集合学习的方法。我们从训练数据集、计算特征、计算方法、性能评估策略以及网络服务器/软件的可用性等方面全面审查了这些计算方法。利用这些基准数据集,我们用可用的在线网站或独立软件对九种预测方法进行了基准测试,并评估了它们的预测性能。我们发现,深度学习和传统机器学习方法的性能普遍优于基于评分函数的方法。总之,本研究中策划的基准数据集库和系统性评估有助于为 m6A 鉴定设计和实施最先进的计算方法提供信息,并促进未来对新方法进行更严格的比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites.

N6-methyladenosine (m6A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m6A mapping have accelerated the accumulation of m6A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m6A site prediction. However, it is still a major challenge to precisely predict m6A sites using in silico approaches. To advance the computational support for m6A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., H. sapiens, M. musculus, Rat, S. cerevisiae, Zebrafish, A. thaliana, Pig, Rhesus, and Chimpanzee). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m6A modification. We revisited 52 computational approaches published since 2015 for m6A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m6A identification and facilitate more rigorous comparisons of new methods in the future.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biology-Basel
Biology-Basel Biological Science-Biological Science
CiteScore
5.70
自引率
4.80%
发文量
1618
审稿时长
11 weeks
期刊介绍: Biology (ISSN 2079-7737) is an international, peer-reviewed, quick-refereeing open access journal of Biological Science published by MDPI online. It publishes reviews, research papers and communications in all areas of biology and at the interface of related disciplines. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信