Overture:一个开源基因组数据平台。

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Mitchell Shiell, Rosi Bajari, Dusan Andric, Jon Eubank, Brandon F Chan, Anders J Richardsson, Azher Ali, Bashar Allabadi, Yelizar Alturmessov, Jared Baker, Ann Catton, Kim Cullion, Daniel DeMaria, Patrick Dos Santos, Henrich Feher, Francois Gerthoffert, Minh Ha, Robin A Haw, Atul Kachru, Alexandru Lepsa, Alexis Li, Rakesh N Mistry, Hardeep K Nahal-Bose, Aleksandra Pejovic, Samantha Rich, Leonardo Rivera, Ciarán Schütte, Edmund Su, Robert Tisma, Jaser Uddin, Chang Wang, Alex N Wilmer, Linda Xiang, Junjun Zhang, Lincoln D Stein, Vincent Ferretti, Mélanie Courtot, Christina K Yung
{"title":"Overture:一个开源基因组数据平台。","authors":"Mitchell Shiell, Rosi Bajari, Dusan Andric, Jon Eubank, Brandon F Chan, Anders J Richardsson, Azher Ali, Bashar Allabadi, Yelizar Alturmessov, Jared Baker, Ann Catton, Kim Cullion, Daniel DeMaria, Patrick Dos Santos, Henrich Feher, Francois Gerthoffert, Minh Ha, Robin A Haw, Atul Kachru, Alexandru Lepsa, Alexis Li, Rakesh N Mistry, Hardeep K Nahal-Bose, Aleksandra Pejovic, Samantha Rich, Leonardo Rivera, Ciarán Schütte, Edmund Su, Robert Tisma, Jaser Uddin, Chang Wang, Alex N Wilmer, Linda Xiang, Junjun Zhang, Lincoln D Stein, Vincent Ferretti, Mélanie Courtot, Christina K Yung","doi":"10.1093/gigascience/giaf038","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Next-generation sequencing has created many new technological challenges in organizing and distributing genomics datasets, which now can routinely reach petabyte scales. Coupled with data-hungry artificial intelligence and machine learning applications, findable, accessible, interoperable, and reusable genomics datasets have never been more valuable. While major archives like the Genomics Data Commons, Sequence Reads Archive, and European Genome-Phenome Archive have improved researchers' ability to share and reuse data, and general-purpose repositories such as Zenodo and Figshare provide valuable platforms for research data publication, the diversity of genomics research precludes any one-size-fits-all approach. In many cases, bespoke solutions are required, and despite funding agencies and journals increasingly mandating reusable data practices, researchers still lack the technical support needed to meet the multifaceted challenges of data reuse.</p><p><strong>Findings: </strong>Overture bridges this gap by providing open-source software for building and deploying customizable genomics data platforms. Its architecture consists of modular microservices, each of which is generalized with narrow responsibilities that together combine to create complete data management systems. These systems enable researchers to organize, share, and explore their genomics data at any scale. Through Overture, researchers can connect their data to both humans and machines, fostering reproducibility and enabling new insights through controlled data sharing and reuse.</p><p><strong>Conclusions: </strong>By making these tools freely available, we can accelerate the development of reliable genomic data management across the research community quickly, flexibly, and at multiple scales. Overture is an open-source project licensed under AGPLv3.0 with all source code publicly available from https://github.com/overture-stack and documentation on development, deployment, and usage available from www.overture.bio.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12020472/pdf/","citationCount":"0","resultStr":"{\"title\":\"Overture: an open-source genomics data platform.\",\"authors\":\"Mitchell Shiell, Rosi Bajari, Dusan Andric, Jon Eubank, Brandon F Chan, Anders J Richardsson, Azher Ali, Bashar Allabadi, Yelizar Alturmessov, Jared Baker, Ann Catton, Kim Cullion, Daniel DeMaria, Patrick Dos Santos, Henrich Feher, Francois Gerthoffert, Minh Ha, Robin A Haw, Atul Kachru, Alexandru Lepsa, Alexis Li, Rakesh N Mistry, Hardeep K Nahal-Bose, Aleksandra Pejovic, Samantha Rich, Leonardo Rivera, Ciarán Schütte, Edmund Su, Robert Tisma, Jaser Uddin, Chang Wang, Alex N Wilmer, Linda Xiang, Junjun Zhang, Lincoln D Stein, Vincent Ferretti, Mélanie Courtot, Christina K Yung\",\"doi\":\"10.1093/gigascience/giaf038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Next-generation sequencing has created many new technological challenges in organizing and distributing genomics datasets, which now can routinely reach petabyte scales. Coupled with data-hungry artificial intelligence and machine learning applications, findable, accessible, interoperable, and reusable genomics datasets have never been more valuable. While major archives like the Genomics Data Commons, Sequence Reads Archive, and European Genome-Phenome Archive have improved researchers' ability to share and reuse data, and general-purpose repositories such as Zenodo and Figshare provide valuable platforms for research data publication, the diversity of genomics research precludes any one-size-fits-all approach. In many cases, bespoke solutions are required, and despite funding agencies and journals increasingly mandating reusable data practices, researchers still lack the technical support needed to meet the multifaceted challenges of data reuse.</p><p><strong>Findings: </strong>Overture bridges this gap by providing open-source software for building and deploying customizable genomics data platforms. Its architecture consists of modular microservices, each of which is generalized with narrow responsibilities that together combine to create complete data management systems. These systems enable researchers to organize, share, and explore their genomics data at any scale. Through Overture, researchers can connect their data to both humans and machines, fostering reproducibility and enabling new insights through controlled data sharing and reuse.</p><p><strong>Conclusions: </strong>By making these tools freely available, we can accelerate the development of reliable genomic data management across the research community quickly, flexibly, and at multiple scales. Overture is an open-source project licensed under AGPLv3.0 with all source code publicly available from https://github.com/overture-stack and documentation on development, deployment, and usage available from www.overture.bio.</p>\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\"14 \",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12020472/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giaf038\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf038","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景:下一代测序在组织和分发基因组学数据集方面创造了许多新的技术挑战,这些数据集现在通常可以达到pb级。再加上数据饥渴的人工智能和机器学习应用程序,可查找、可访问、可互操作和可重用的基因组学数据集从未像现在这样有价值。虽然基因组学数据共享、序列读取档案和欧洲基因组-表型档案等主要档案提高了研究人员共享和重用数据的能力,而通用存储库(如Zenodo和Figshare)为研究数据发布提供了有价值的平台,但基因组学研究的多样性排除了任何一种通用的方法。在许多情况下,需要定制的解决方案,尽管资助机构和期刊越来越多地要求可重用数据实践,但研究人员仍然缺乏应对数据重用的多方面挑战所需的技术支持。Overture通过提供开源软件来构建和部署可定制的基因组数据平台,从而弥补了这一差距。它的体系结构由模块化的微服务组成,每个微服务都具有狭义的职责,这些职责结合在一起创建了完整的数据管理系统。这些系统使研究人员能够组织、共享和探索任何规模的基因组学数据。通过Overture,研究人员可以将他们的数据与人类和机器连接起来,通过受控的数据共享和重用来促进再现性并实现新的见解。结论:通过免费提供这些工具,我们可以快速、灵活、多尺度地加速整个研究界可靠的基因组数据管理的发展。Overture是一个基于AGPLv3.0许可的开源项目,所有源代码都可以从https://github.com/overture-stack公开获得,有关开发、部署和使用的文档可以从www.overture.bio获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Overture: an open-source genomics data platform.

Background: Next-generation sequencing has created many new technological challenges in organizing and distributing genomics datasets, which now can routinely reach petabyte scales. Coupled with data-hungry artificial intelligence and machine learning applications, findable, accessible, interoperable, and reusable genomics datasets have never been more valuable. While major archives like the Genomics Data Commons, Sequence Reads Archive, and European Genome-Phenome Archive have improved researchers' ability to share and reuse data, and general-purpose repositories such as Zenodo and Figshare provide valuable platforms for research data publication, the diversity of genomics research precludes any one-size-fits-all approach. In many cases, bespoke solutions are required, and despite funding agencies and journals increasingly mandating reusable data practices, researchers still lack the technical support needed to meet the multifaceted challenges of data reuse.

Findings: Overture bridges this gap by providing open-source software for building and deploying customizable genomics data platforms. Its architecture consists of modular microservices, each of which is generalized with narrow responsibilities that together combine to create complete data management systems. These systems enable researchers to organize, share, and explore their genomics data at any scale. Through Overture, researchers can connect their data to both humans and machines, fostering reproducibility and enabling new insights through controlled data sharing and reuse.

Conclusions: By making these tools freely available, we can accelerate the development of reliable genomic data management across the research community quickly, flexibly, and at multiple scales. Overture is an open-source project licensed under AGPLv3.0 with all source code publicly available from https://github.com/overture-stack and documentation on development, deployment, and usage available from www.overture.bio.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信