支持大规模罕见病研究的互联数据基础设施。

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Lennart F Johansson, Steve Laurie, Dylan Spalding, Spencer Gibson, David Ruvolo, Coline Thomas, Davide Piscia, Fernanda de Andrade, Gerieke Been, Marieke Bijlsma, Han Brunner, Sandi Cimerman, Farid Yavari Dizjikan, Kornelia Ellwanger, Marcos Fernandez, Mallory Freeberg, Gert-Jan van de Geijn, Roan Kanninga, Vatsalya Maddi, Mehdi Mehtarizadeh, Pieter Neerincx, Stephan Ossowski, Ana Rath, Dieuwke Roelofs-Prins, Marloes Stok-Benjamins, K Joeri van der Velde, Colin Veal, Gerben van der Vries, Marc Wadsley, Gregory Warren, Birte Zurek, Thomas Keane, Holm Graessner, Sergi Beltran, Morris A Swertz, Anthony J Brookes
{"title":"支持大规模罕见病研究的互联数据基础设施。","authors":"Lennart F Johansson, Steve Laurie, Dylan Spalding, Spencer Gibson, David Ruvolo, Coline Thomas, Davide Piscia, Fernanda de Andrade, Gerieke Been, Marieke Bijlsma, Han Brunner, Sandi Cimerman, Farid Yavari Dizjikan, Kornelia Ellwanger, Marcos Fernandez, Mallory Freeberg, Gert-Jan van de Geijn, Roan Kanninga, Vatsalya Maddi, Mehdi Mehtarizadeh, Pieter Neerincx, Stephan Ossowski, Ana Rath, Dieuwke Roelofs-Prins, Marloes Stok-Benjamins, K Joeri van der Velde, Colin Veal, Gerben van der Vries, Marc Wadsley, Gregory Warren, Birte Zurek, Thomas Keane, Holm Graessner, Sergi Beltran, Morris A Swertz, Anthony J Brookes","doi":"10.1093/gigascience/giae058","DOIUrl":null,"url":null,"abstract":"<p><p>The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing (\"solving\") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS \"RD3\" and Café Variome \"Discovery Nexus\" connect data and metadata and offer discovery services, and secure cloud-based \"Sandboxes\" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413801/pdf/","citationCount":"0","resultStr":"{\"title\":\"An interconnected data infrastructure to support large-scale rare disease research.\",\"authors\":\"Lennart F Johansson, Steve Laurie, Dylan Spalding, Spencer Gibson, David Ruvolo, Coline Thomas, Davide Piscia, Fernanda de Andrade, Gerieke Been, Marieke Bijlsma, Han Brunner, Sandi Cimerman, Farid Yavari Dizjikan, Kornelia Ellwanger, Marcos Fernandez, Mallory Freeberg, Gert-Jan van de Geijn, Roan Kanninga, Vatsalya Maddi, Mehdi Mehtarizadeh, Pieter Neerincx, Stephan Ossowski, Ana Rath, Dieuwke Roelofs-Prins, Marloes Stok-Benjamins, K Joeri van der Velde, Colin Veal, Gerben van der Vries, Marc Wadsley, Gregory Warren, Birte Zurek, Thomas Keane, Holm Graessner, Sergi Beltran, Morris A Swertz, Anthony J Brookes\",\"doi\":\"10.1093/gigascience/giae058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing (\\\"solving\\\") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS \\\"RD3\\\" and Café Variome \\\"Discovery Nexus\\\" connect data and metadata and offer discovery services, and secure cloud-based \\\"Sandboxes\\\" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.</p>\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2024-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413801/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giae058\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae058","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

Solve-RD 项目汇集了来自 15 个国家 51 个研究所的临床医生、科学家和患者代表,共同合作进行罕见病 (RD) 的基因诊断("解决")。该项目旨在通过共同分析成千上万例罕见病病例的数据,包括表型、血统、外显子组/基因组测序和多组学数据,大幅提高诊断成功率。我们在此报告为支持这一共同分析而设计和创建的数据基础设施。该基础设施使用户能够以协作的方式存储、查找、连接和分析数据和元数据。假名化的表型数据和原始实验数据被提交到 RD-Connect 基因组-表型组分析平台,并通过标准化管道进行处理。结果文件和新生成的 omics 数据被发送到欧洲基因组表型组档案馆,该档案馆会添加唯一的文件标识符,并提供长期存储和受控访问服务。MOLGENIS "RD3 "和 Café Variome "Discovery Nexus "连接数据和元数据并提供发现服务,基于云的安全 "沙箱 "支持多方数据分析。这一成功部署且实用的基础设施设计为其他需要分析大量异构数据的项目提供了蓝本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An interconnected data infrastructure to support large-scale rare disease research.

The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing ("solving") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS "RD3" and Café Variome "Discovery Nexus" connect data and metadata and offer discovery services, and secure cloud-based "Sandboxes" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信