人类生物库的网络数据库

Myles Axton
{"title":"人类生物库的网络数据库","authors":"Myles Axton","doi":"10.1002/ggn2.10049","DOIUrl":null,"url":null,"abstract":"<p>“Enduring trust is essential for lifelong collection of phenotypes and traits. Good metadata, local ownership and fair reuse of genomic and outcome data can be sustained in partnership among well-resourced and well-peopled regions of the world.“</p><p>A Perspective in this issue recounts the endeavors of the collaborative Virus Outbreak Data Access Network (VODAN)—Africa<span><sup>2</sup></span> that collected SARS-CoV-2 outcomes using electronic case report forms (eCRFs) and other templates and organized the data with FAIR metadata models in linked data for clinical decision making and research queries. One use case employed temporal, numerical and geolocator metadata to connect de-identified interviews with displaced people in Tunisia about their COVID-19 infection outcomes to media reports that aggregate information on the same groups. The metadata model and the data it describes were stored as linked data that could be remotely queried across the network of nine African countries.</p><p>Genome data is cheap, plentiful, and concentrated in a few wealthy places, even relative to the information web, where less than 1% of the world's servers serve over 99% of the web content.<span><sup>1</sup></span> This situation arises because it is difficult to move petabytes of data (since it is hard drives rather than bytes that travel). Second, the need for efficient search over similar formats of data usually leads to centralized accumulation of resources. Third, trusted and secure sharing of data resources among distributed sites requires metadata standards and linked data conventions that permit both computer operations without parsing or data transformation and queries from human users who range from clinicians to bioinformaticians to government agencies. Finally, application of any agreed metadata standards needs to be rapid and very low cost if it is to be more than a specialized research and training exercise.</p><p>In contrast to genome data, personal experiences including exposures and clinical records are distributed across institutions, homes, families, and individuals. Lifelong trust that sharing this information brings better outcomes for the donors is essential if we are to use this living biobank of diverse experience to make sense of variation in both viral and human genomes. Information from affected and unaffected individuals is needed to understand the importance of even point mutations in small viral genomes—such as the SARS-CoV-2 variants that continue to cause so much disruption and disease worldwide. Yet this data has not been gathered from places where the disruption is occurring, largely because we do not yet have collection networks with the trust and capacity to sustainably return results within the region of study.</p><p>There are now several related functional technologies for linked data to deliver the aspirational goals laid out in the principles of FAIR data and services. These working together would amount to a mercantile revolution in the global data trade rather than the gold rush metaphor of the Perspective.<span><sup>2</sup></span> Shipping containers for ideas can be made from Research Object Crates<span><sup>3, 4</sup></span> bearing just enough standard metadata for basic interoperability and relabeling. Unlike cargo, however, data will not move, but instead, the user's queries will travel systematically to data containers they identify by their appropriate licenses, permissions, provenance, and descriptions for use. Autonomously controlled pods of personal information can be licensed for social cooperation, research or profit as the owner sees fit.<span><sup>5, 6</sup></span> This change of emphasis to good labeling for data visiting is the basis for developing products like a personal health train.<span><sup>7</sup></span></p><p>The VODAN project has contributed to capacity building through its active interdisciplinary cooperation informatics training plan and has greatly promoted the cause of equitable autonomous and secure data ownership. However, it may be some time before the participating sites will be able both to innovate and interoperate fully on the same network in a distributed fashion as the project was originally conceived. Problems inherent in the stability of each local datastore's query protocol service and the potential for inadvertent divergence in implementation led to a tactical decision instead to use centrally provided metadata templates and mirrored storage at a secure central site (CEDAR). This redesign shows that central providers may gain sufficient trust, and partner through secure hosting and open commitment to the storage, organization and ownership of data by the participating local informatics communities across the network. The dream of building distributed capacity together across data rich and resource rich regions remains alive and compelling as ever.</p><p><b>Myles Axton:</b> Writing-original draft; writing-review &amp; editing.</p>","PeriodicalId":72071,"journal":{"name":"Advanced genetics (Hoboken, N.J.)","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ggn2.10049","citationCount":"0","resultStr":"{\"title\":\"A network database for the human biobank\",\"authors\":\"Myles Axton\",\"doi\":\"10.1002/ggn2.10049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>“Enduring trust is essential for lifelong collection of phenotypes and traits. Good metadata, local ownership and fair reuse of genomic and outcome data can be sustained in partnership among well-resourced and well-peopled regions of the world.“</p><p>A Perspective in this issue recounts the endeavors of the collaborative Virus Outbreak Data Access Network (VODAN)—Africa<span><sup>2</sup></span> that collected SARS-CoV-2 outcomes using electronic case report forms (eCRFs) and other templates and organized the data with FAIR metadata models in linked data for clinical decision making and research queries. One use case employed temporal, numerical and geolocator metadata to connect de-identified interviews with displaced people in Tunisia about their COVID-19 infection outcomes to media reports that aggregate information on the same groups. The metadata model and the data it describes were stored as linked data that could be remotely queried across the network of nine African countries.</p><p>Genome data is cheap, plentiful, and concentrated in a few wealthy places, even relative to the information web, where less than 1% of the world's servers serve over 99% of the web content.<span><sup>1</sup></span> This situation arises because it is difficult to move petabytes of data (since it is hard drives rather than bytes that travel). Second, the need for efficient search over similar formats of data usually leads to centralized accumulation of resources. Third, trusted and secure sharing of data resources among distributed sites requires metadata standards and linked data conventions that permit both computer operations without parsing or data transformation and queries from human users who range from clinicians to bioinformaticians to government agencies. Finally, application of any agreed metadata standards needs to be rapid and very low cost if it is to be more than a specialized research and training exercise.</p><p>In contrast to genome data, personal experiences including exposures and clinical records are distributed across institutions, homes, families, and individuals. Lifelong trust that sharing this information brings better outcomes for the donors is essential if we are to use this living biobank of diverse experience to make sense of variation in both viral and human genomes. Information from affected and unaffected individuals is needed to understand the importance of even point mutations in small viral genomes—such as the SARS-CoV-2 variants that continue to cause so much disruption and disease worldwide. Yet this data has not been gathered from places where the disruption is occurring, largely because we do not yet have collection networks with the trust and capacity to sustainably return results within the region of study.</p><p>There are now several related functional technologies for linked data to deliver the aspirational goals laid out in the principles of FAIR data and services. These working together would amount to a mercantile revolution in the global data trade rather than the gold rush metaphor of the Perspective.<span><sup>2</sup></span> Shipping containers for ideas can be made from Research Object Crates<span><sup>3, 4</sup></span> bearing just enough standard metadata for basic interoperability and relabeling. Unlike cargo, however, data will not move, but instead, the user's queries will travel systematically to data containers they identify by their appropriate licenses, permissions, provenance, and descriptions for use. Autonomously controlled pods of personal information can be licensed for social cooperation, research or profit as the owner sees fit.<span><sup>5, 6</sup></span> This change of emphasis to good labeling for data visiting is the basis for developing products like a personal health train.<span><sup>7</sup></span></p><p>The VODAN project has contributed to capacity building through its active interdisciplinary cooperation informatics training plan and has greatly promoted the cause of equitable autonomous and secure data ownership. However, it may be some time before the participating sites will be able both to innovate and interoperate fully on the same network in a distributed fashion as the project was originally conceived. Problems inherent in the stability of each local datastore's query protocol service and the potential for inadvertent divergence in implementation led to a tactical decision instead to use centrally provided metadata templates and mirrored storage at a secure central site (CEDAR). This redesign shows that central providers may gain sufficient trust, and partner through secure hosting and open commitment to the storage, organization and ownership of data by the participating local informatics communities across the network. The dream of building distributed capacity together across data rich and resource rich regions remains alive and compelling as ever.</p><p><b>Myles Axton:</b> Writing-original draft; writing-review &amp; editing.</p>\",\"PeriodicalId\":72071,\"journal\":{\"name\":\"Advanced genetics (Hoboken, N.J.)\",\"volume\":\"2 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1002/ggn2.10049\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advanced genetics (Hoboken, N.J.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ggn2.10049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced genetics (Hoboken, N.J.)","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ggn2.10049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

“持久的信任对于终生收集表型和特征至关重要。良好的元数据、地方所有权以及基因组和结果数据的公平再利用可以在世界上资源丰富和人口稠密的地区之间建立伙伴关系。“本期的一个视角讲述了协同病毒爆发数据访问网络(VODAN) -非洲2的努力,该网络使用电子病例报告表(ecrf)和其他模板收集SARS-CoV-2结果,并在关联数据中使用FAIR元数据模型组织数据,用于临床决策和研究查询。一个用例使用时间、数字和地理定位器元数据,将突尼斯流离失所者关于其COVID-19感染结果的去识别访谈与汇总同一群体信息的媒体报道联系起来。元数据模型及其描述的数据被存储为链接数据,可以在九个非洲国家的网络中远程查询。基因组数据便宜、丰富,而且集中在少数富裕的地方,甚至与信息网络相比也是如此。在信息网络中,世界上不到1%的服务器服务着99%以上的网络内容出现这种情况是因为很难移动pb级的数据(因为传输的是硬盘驱动器,而不是字节)。其次,对类似格式的数据进行高效搜索的需求通常会导致资源的集中积累。第三,分布式站点之间可信和安全的数据资源共享需要元数据标准和关联数据约定,这些标准和约定既允许无需解析或数据转换的计算机操作,也允许来自临床医生、生物信息学家和政府机构等人类用户的查询。最后,任何商定的元数据标准的应用都需要快速且成本极低,如果它不仅仅是一个专门的研究和培训活动的话。与基因组数据相比,包括暴露和临床记录在内的个人经历分布在机构、家庭、家庭和个人之间。如果我们要利用这个拥有丰富经验的活体生物库来理解病毒和人类基因组的变异,就必须终生相信,分享这些信息会给捐赠者带来更好的结果。需要来自受影响和未受影响个体的信息,以了解小病毒基因组中甚至点突变的重要性,例如在世界范围内继续造成如此多破坏和疾病的SARS-CoV-2变体。然而,这些数据尚未从发生破坏的地方收集,主要是因为我们尚未建立具有信任和能力的收集网络,以在研究区域内可持续地返回结果。现在有几种相关的功能技术用于关联数据,以实现FAIR数据和服务原则中提出的理想目标。这些协同工作将构成全球数据贸易的商业革命,而不是透视中的淘金热隐喻。2可以从研究对象板条箱(Research Object Crates3, 4)中制造出装载思想的集装箱,其中包含足够的标准元数据,以实现基本的互操作性和重新标记。然而,与货物不同的是,数据不会移动,相反,用户的查询将系统地传递到他们通过适当的许可、权限、来源和描述识别的数据容器中,以供使用。自主控制的个人信息舱可以被授权用于社会合作、研究或所有者认为合适的利润。这种将重点转向数据访问的良好标签是开发个人健康培训等产品的基础。7 . VODAN项目通过其积极的跨学科合作信息学培训计划为能力建设做出了贡献,并极大地促进了公平自主和安全的数据所有权事业。然而,参与的站点可能需要一段时间才能像项目最初设想的那样,以分布式的方式在同一网络上进行创新和完全互操作。每个本地数据存储的查询协议服务的稳定性所固有的问题,以及在实现上无意中出现分歧的可能性,导致了一项战术决策,即在安全的中心站点(CEDAR)上使用中央提供的元数据模板和镜像存储。这种重新设计表明,通过安全托管和对网络上参与的本地信息学社区的数据存储、组织和所有权的公开承诺,中央提供商可以获得足够的信任和合作伙伴。在数据丰富和资源丰富的地区共同构建分布式能力的梦想仍然存在,并且一如既往地引人注目。迈尔斯·艾克斯顿:写作原稿;writing-review,编辑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A network database for the human biobank

“Enduring trust is essential for lifelong collection of phenotypes and traits. Good metadata, local ownership and fair reuse of genomic and outcome data can be sustained in partnership among well-resourced and well-peopled regions of the world.“

A Perspective in this issue recounts the endeavors of the collaborative Virus Outbreak Data Access Network (VODAN)—Africa2 that collected SARS-CoV-2 outcomes using electronic case report forms (eCRFs) and other templates and organized the data with FAIR metadata models in linked data for clinical decision making and research queries. One use case employed temporal, numerical and geolocator metadata to connect de-identified interviews with displaced people in Tunisia about their COVID-19 infection outcomes to media reports that aggregate information on the same groups. The metadata model and the data it describes were stored as linked data that could be remotely queried across the network of nine African countries.

Genome data is cheap, plentiful, and concentrated in a few wealthy places, even relative to the information web, where less than 1% of the world's servers serve over 99% of the web content.1 This situation arises because it is difficult to move petabytes of data (since it is hard drives rather than bytes that travel). Second, the need for efficient search over similar formats of data usually leads to centralized accumulation of resources. Third, trusted and secure sharing of data resources among distributed sites requires metadata standards and linked data conventions that permit both computer operations without parsing or data transformation and queries from human users who range from clinicians to bioinformaticians to government agencies. Finally, application of any agreed metadata standards needs to be rapid and very low cost if it is to be more than a specialized research and training exercise.

In contrast to genome data, personal experiences including exposures and clinical records are distributed across institutions, homes, families, and individuals. Lifelong trust that sharing this information brings better outcomes for the donors is essential if we are to use this living biobank of diverse experience to make sense of variation in both viral and human genomes. Information from affected and unaffected individuals is needed to understand the importance of even point mutations in small viral genomes—such as the SARS-CoV-2 variants that continue to cause so much disruption and disease worldwide. Yet this data has not been gathered from places where the disruption is occurring, largely because we do not yet have collection networks with the trust and capacity to sustainably return results within the region of study.

There are now several related functional technologies for linked data to deliver the aspirational goals laid out in the principles of FAIR data and services. These working together would amount to a mercantile revolution in the global data trade rather than the gold rush metaphor of the Perspective.2 Shipping containers for ideas can be made from Research Object Crates3, 4 bearing just enough standard metadata for basic interoperability and relabeling. Unlike cargo, however, data will not move, but instead, the user's queries will travel systematically to data containers they identify by their appropriate licenses, permissions, provenance, and descriptions for use. Autonomously controlled pods of personal information can be licensed for social cooperation, research or profit as the owner sees fit.5, 6 This change of emphasis to good labeling for data visiting is the basis for developing products like a personal health train.7

The VODAN project has contributed to capacity building through its active interdisciplinary cooperation informatics training plan and has greatly promoted the cause of equitable autonomous and secure data ownership. However, it may be some time before the participating sites will be able both to innovate and interoperate fully on the same network in a distributed fashion as the project was originally conceived. Problems inherent in the stability of each local datastore's query protocol service and the potential for inadvertent divergence in implementation led to a tactical decision instead to use centrally provided metadata templates and mirrored storage at a secure central site (CEDAR). This redesign shows that central providers may gain sufficient trust, and partner through secure hosting and open commitment to the storage, organization and ownership of data by the participating local informatics communities across the network. The dream of building distributed capacity together across data rich and resource rich regions remains alive and compelling as ever.

Myles Axton: Writing-original draft; writing-review & editing.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信