人工智能时代的生物数据库。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-03-20 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf044
Mihai Pop, Teresa K Attwood, Judith A Blake, Philip E Bourne, Ana Conesa, Terry Gaasterland, Lawrence Hunter, Carl Kingsford, Oliver Kohlbacher, Thomas Lengauer, Scott Markel, Yves Moreau, William S Noble, Christine Orengo, B F Francis Ouellette, Laxmi Parida, Natasa Przulj, Teresa M Przytycka, Shoba Ranganathan, Russell Schwartz, Alfonso Valencia, Tandy Warnow
{"title":"人工智能时代的生物数据库。","authors":"Mihai Pop, Teresa K Attwood, Judith A Blake, Philip E Bourne, Ana Conesa, Terry Gaasterland, Lawrence Hunter, Carl Kingsford, Oliver Kohlbacher, Thomas Lengauer, Scott Markel, Yves Moreau, William S Noble, Christine Orengo, B F Francis Ouellette, Laxmi Parida, Natasa Przulj, Teresa M Przytycka, Shoba Ranganathan, Russell Schwartz, Alfonso Valencia, Tandy Warnow","doi":"10.1093/bioadv/vbaf044","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expensive validation experiments. The emergence of generative artificial intelligence systems threatens to compound this problem owing to the ease with which massive volumes of synthetic data can be generated. We provide an overview of several key issues that occur within the biological data ecosystem and make several recommendations aimed at reducing data errors and their propagation. We specifically highlight the critical importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering. We also argue for increased theoretical and empirical research on data provenance, error propagation, and on understanding the impact of errors on analytic pipelines. Furthermore, we recommend enhanced funding for the stewardship and maintenance of public biological databases.</p><p><strong>Availability and implementation: </strong>Not applicable.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf044"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11964588/pdf/","citationCount":"0","resultStr":"{\"title\":\"Biological databases in the age of generative artificial intelligence.\",\"authors\":\"Mihai Pop, Teresa K Attwood, Judith A Blake, Philip E Bourne, Ana Conesa, Terry Gaasterland, Lawrence Hunter, Carl Kingsford, Oliver Kohlbacher, Thomas Lengauer, Scott Markel, Yves Moreau, William S Noble, Christine Orengo, B F Francis Ouellette, Laxmi Parida, Natasa Przulj, Teresa M Przytycka, Shoba Ranganathan, Russell Schwartz, Alfonso Valencia, Tandy Warnow\",\"doi\":\"10.1093/bioadv/vbaf044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Summary: </strong>Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expensive validation experiments. The emergence of generative artificial intelligence systems threatens to compound this problem owing to the ease with which massive volumes of synthetic data can be generated. We provide an overview of several key issues that occur within the biological data ecosystem and make several recommendations aimed at reducing data errors and their propagation. We specifically highlight the critical importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering. We also argue for increased theoretical and empirical research on data provenance, error propagation, and on understanding the impact of errors on analytic pipelines. Furthermore, we recommend enhanced funding for the stewardship and maintenance of public biological databases.</p><p><strong>Availability and implementation: </strong>Not applicable.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf044\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11964588/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

摘要:现代生物学研究严重依赖于公共数据库。数据库内部和数据库之间错误的引入和传播可能会导致资源浪费,因为科学家会被错误的数据引入歧途,或者不得不进行昂贵的验证实验。由于可以轻松生成大量合成数据,生成式人工智能系统的出现可能会加剧这一问题。我们概述了生物数据生态系统中出现的几个关键问题,并提出了一些旨在减少数据错误及其传播的建议。我们特别强调了提高针对生物学家和生命科学家的教育项目的重要性,这些项目强调数据工程的最佳实践。我们还主张增加对数据来源、错误传播以及理解错误对分析管道的影响的理论和实证研究。此外,我们建议为管理和维护公共生物数据库增加资金。可用性和实现:不适用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Biological databases in the age of generative artificial intelligence.

Summary: Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expensive validation experiments. The emergence of generative artificial intelligence systems threatens to compound this problem owing to the ease with which massive volumes of synthetic data can be generated. We provide an overview of several key issues that occur within the biological data ecosystem and make several recommendations aimed at reducing data errors and their propagation. We specifically highlight the critical importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering. We also argue for increased theoretical and empirical research on data provenance, error propagation, and on understanding the impact of errors on analytic pipelines. Furthermore, we recommend enhanced funding for the stewardship and maintenance of public biological databases.

Availability and implementation: Not applicable.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信