提高NHGRI资源生态系统的公平性和可持续性。

ArXiv Pub Date : 2025-08-19

Larry Babb, Carol Bult, Vincent J Carey, Robert J Carroll, Benjamin C Hitz, Chris J Mungall, Heidi L Rehm, Michael C Schatz, Alex Wagner

{"title":"提高NHGRI资源生态系统的公平性和可持续性。","authors":"Larry Babb, Carol Bult, Vincent J Carey, Robert J Carroll, Benjamin C Hitz, Chris J Mungall, Heidi L Rehm, Michael C Schatz, Alex Wagner","doi":"","DOIUrl":null,"url":null,"abstract":"In 2024, individuals funded by NHGRI to support genomic community resources completed a Self-Assessment Tool (SAT) to evaluate their application of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and assess their sustainability. By collecting insights from the self-administered questionnaires and conducting personal interviews, a valuable perspective was gained on the FAIRness and sustainability of the NHGRI resources. The results highlighted several challenges and key areas the NHGRI resource community could improve by working together to form recommendations to address these challenges. The next step was the formation of an Organizing Committee to identify which challenges could lead to best practices or guidelines for the community. The workshop's Organizing Committee comprised four members from the NHGRI resource community: Carol Bult, PhD, Chris Mungall, PhD, Heidi Rehm, PhD, and Michael Schatz, PhD. In December 2024, the Organizing Committee engaged with the NHGRI resource community to refine these challenges further, inviting feedback on potential focus areas for a future workshop. This collaborative approach led to two informative webinars in December 2024, highlighting specific challenges in data curation, data processing, metadata tools, and variant identifiers within the NHGRI resources. Throughout the workshop planning process, the four Organizing Committee members worked together to create and develop themes, design breakout sessions, and create a detailed agenda. The workshop's agenda was intentionally structured to ensure participants could generate implementable recommendations for the NHGRI resource community. The two-day workshop was held in Bethesda, MD, on March 3-4, 2025. The challenges received from NHGRI resources were classified into four key categories, forming the basis of the workshop. The four key categories are variant identifiers, data processing, data curation, and metadata tools. They are briefly described below, with greater details on their challenges and recommendations in subsequent sections. Metadata Tools:While metadata is vital for capturing context in genomic datasets, its usage and relevance can vary by domain, making it difficult to standardize usage. While various methods exist for annotating and extracting metadata, incomplete or inconsistent annotations often result in ineffective data sharing and interoperability, further reducing data usability and reproducibility.Data Curation:Curation of annotations for genomics data is critical for FAIR-ness. Scalable curation solutions are challenging because of the multiple components for curation, including harmonizing data sets, data cleaning, and annotation. The workshop focused on identifying which aspects of data curation could be streamlined using computational methods while considering the barriers to increased automation.Variant Identifiers:Variant identifiers are standardized representations of genetic variants, crucial for sharing and interpreting genomic data in research and clinical work. They ensure consistent referencing and enable data aggregation. Standardizing variant identifiers is difficult due to varied formats, complex data, and distinct environments for generating and disseminating data.Data Processing:Data processing is a necessary first step in a FAIR environment. As there are many variant workflows, streamlining this process will ensure greater accuracy, reproducibility, interoperability, and FAIRness, driving advancements in clinical research. The workshop focused on addressing these aspects with a key focus on improvements and best practices around data processing for an NHGRI resource. Several recommendations were made throughout the workshop's interactive sessions with the resources' participants. While many recommendations were specific to data processing, data curation, metadata tools, or variant identifiers, they can be grouped into core recommendations addressing common challenges within the NHGRI resource community. These core recommendations highlight the key themes that emerged across sessions and are listed in the nine recommendations below. Increase transparency to enable effective sharing/reproducibility (documenting, benchmarking, publishing, mapping)Develop entity schema and ontology mapping tools (between models, identifiers, etc.)Annotate tools using resources to increase findability and reuse (Examples: EDAM Ontology of Bioscientific data analysis and data management)Use standard nomenclature and identifiersMake workflows usable by researchers with limited programming expertiseImplement APIs to improve data connectivityPresent data in an interpretable manner, along with machine readabilityDevelop artificial intelligence/machine learning (AI/ML) methods for scaling curation processesAssess the impact of resources using an independent group that can assess return on investment and impact to health and scientific advancement. An additional key collaborative outcome was the development of Appendix A, which outlines ongoing and future efforts, including additional workshops, webinars, and meetings through the listed events provided by the NHGRI resource community. We hope that these activities will enable further advances in the implementation of FAIR standards and continue to foster collaboration and exchange across NHGRI resources and the global community.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393232/pdf/","citationCount":"0","resultStr":"{\"title\":\"Improving the FAIRness and Sustainability of the NHGRI Resources Ecosystem.\",\"authors\":\"Larry Babb, Carol Bult, Vincent J Carey, Robert J Carroll, Benjamin C Hitz, Chris J Mungall, Heidi L Rehm, Michael C Schatz, Alex Wagner\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In 2024, individuals funded by NHGRI to support genomic community resources completed a Self-Assessment Tool (SAT) to evaluate their application of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and assess their sustainability. By collecting insights from the self-administered questionnaires and conducting personal interviews, a valuable perspective was gained on the FAIRness and sustainability of the NHGRI resources. The results highlighted several challenges and key areas the NHGRI resource community could improve by working together to form recommendations to address these challenges. The next step was the formation of an Organizing Committee to identify which challenges could lead to best practices or guidelines for the community. The workshop's Organizing Committee comprised four members from the NHGRI resource community: Carol Bult, PhD, Chris Mungall, PhD, Heidi Rehm, PhD, and Michael Schatz, PhD. In December 2024, the Organizing Committee engaged with the NHGRI resource community to refine these challenges further, inviting feedback on potential focus areas for a future workshop. This collaborative approach led to two informative webinars in December 2024, highlighting specific challenges in data curation, data processing, metadata tools, and variant identifiers within the NHGRI resources. Throughout the workshop planning process, the four Organizing Committee members worked together to create and develop themes, design breakout sessions, and create a detailed agenda. The workshop's agenda was intentionally structured to ensure participants could generate implementable recommendations for the NHGRI resource community. The two-day workshop was held in Bethesda, MD, on March 3-4, 2025. The challenges received from NHGRI resources were classified into four key categories, forming the basis of the workshop. The four key categories are variant identifiers, data processing, data curation, and metadata tools. They are briefly described below, with greater details on their challenges and recommendations in subsequent sections. Metadata Tools:While metadata is vital for capturing context in genomic datasets, its usage and relevance can vary by domain, making it difficult to standardize usage. While various methods exist for annotating and extracting metadata, incomplete or inconsistent annotations often result in ineffective data sharing and interoperability, further reducing data usability and reproducibility.Data Curation:Curation of annotations for genomics data is critical for FAIR-ness. Scalable curation solutions are challenging because of the multiple components for curation, including harmonizing data sets, data cleaning, and annotation. The workshop focused on identifying which aspects of data curation could be streamlined using computational methods while considering the barriers to increased automation.Variant Identifiers:Variant identifiers are standardized representations of genetic variants, crucial for sharing and interpreting genomic data in research and clinical work. They ensure consistent referencing and enable data aggregation. Standardizing variant identifiers is difficult due to varied formats, complex data, and distinct environments for generating and disseminating data.Data Processing:Data processing is a necessary first step in a FAIR environment. As there are many variant workflows, streamlining this process will ensure greater accuracy, reproducibility, interoperability, and FAIRness, driving advancements in clinical research. The workshop focused on addressing these aspects with a key focus on improvements and best practices around data processing for an NHGRI resource. Several recommendations were made throughout the workshop's interactive sessions with the resources' participants. While many recommendations were specific to data processing, data curation, metadata tools, or variant identifiers, they can be grouped into core recommendations addressing common challenges within the NHGRI resource community. These core recommendations highlight the key themes that emerged across sessions and are listed in the nine recommendations below. Increase transparency to enable effective sharing/reproducibility (documenting, benchmarking, publishing, mapping)Develop entity schema and ontology mapping tools (between models, identifiers, etc.)Annotate tools using resources to increase findability and reuse (Examples: EDAM Ontology of Bioscientific data analysis and data management)Use standard nomenclature and identifiersMake workflows usable by researchers with limited programming expertiseImplement APIs to improve data connectivityPresent data in an interpretable manner, along with machine readabilityDevelop artificial intelligence/machine learning (AI/ML) methods for scaling curation processesAssess the impact of resources using an independent group that can assess return on investment and impact to health and scientific advancement. An additional key collaborative outcome was the development of Appendix A, which outlines ongoing and future efforts, including additional workshops, webinars, and meetings through the listed events provided by the NHGRI resource community. We hope that these activities will enable further advances in the implementation of FAIR standards and continue to foster collaboration and exchange across NHGRI resources and the global community.\",\"PeriodicalId\":93888,\"journal\":{\"name\":\"ArXiv\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393232/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

另一个重要的合作成果是制定了附录A，其中概述了正在进行和未来的努力，包括通过NHGRI资源社区提供的列出的活动举办的额外讲习班、网络研讨会和会议。我们希望这些活动将进一步推动公平标准的实施，并继续促进NHGRI资源和国际社会之间的合作与交流。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

本刊更多论文

Improving the FAIRness and Sustainability of the NHGRI Resources Ecosystem.

In 2024, individuals funded by NHGRI to support genomic community resources completed a Self-Assessment Tool (SAT) to evaluate their application of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and assess their sustainability. By collecting insights from the self-administered questionnaires and conducting personal interviews, a valuable perspective was gained on the FAIRness and sustainability of the NHGRI resources. The results highlighted several challenges and key areas the NHGRI resource community could improve by working together to form recommendations to address these challenges. The next step was the formation of an Organizing Committee to identify which challenges could lead to best practices or guidelines for the community. The workshop's Organizing Committee comprised four members from the NHGRI resource community: Carol Bult, PhD, Chris Mungall, PhD, Heidi Rehm, PhD, and Michael Schatz, PhD. In December 2024, the Organizing Committee engaged with the NHGRI resource community to refine these challenges further, inviting feedback on potential focus areas for a future workshop. This collaborative approach led to two informative webinars in December 2024, highlighting specific challenges in data curation, data processing, metadata tools, and variant identifiers within the NHGRI resources. Throughout the workshop planning process, the four Organizing Committee members worked together to create and develop themes, design breakout sessions, and create a detailed agenda. The workshop's agenda was intentionally structured to ensure participants could generate implementable recommendations for the NHGRI resource community. The two-day workshop was held in Bethesda, MD, on March 3-4, 2025. The challenges received from NHGRI resources were classified into four key categories, forming the basis of the workshop. The four key categories are variant identifiers, data processing, data curation, and metadata tools. They are briefly described below, with greater details on their challenges and recommendations in subsequent sections. Metadata Tools:While metadata is vital for capturing context in genomic datasets, its usage and relevance can vary by domain, making it difficult to standardize usage. While various methods exist for annotating and extracting metadata, incomplete or inconsistent annotations often result in ineffective data sharing and interoperability, further reducing data usability and reproducibility.Data Curation:Curation of annotations for genomics data is critical for FAIR-ness. Scalable curation solutions are challenging because of the multiple components for curation, including harmonizing data sets, data cleaning, and annotation. The workshop focused on identifying which aspects of data curation could be streamlined using computational methods while considering the barriers to increased automation.Variant Identifiers:Variant identifiers are standardized representations of genetic variants, crucial for sharing and interpreting genomic data in research and clinical work. They ensure consistent referencing and enable data aggregation. Standardizing variant identifiers is difficult due to varied formats, complex data, and distinct environments for generating and disseminating data.Data Processing:Data processing is a necessary first step in a FAIR environment. As there are many variant workflows, streamlining this process will ensure greater accuracy, reproducibility, interoperability, and FAIRness, driving advancements in clinical research. The workshop focused on addressing these aspects with a key focus on improvements and best practices around data processing for an NHGRI resource. Several recommendations were made throughout the workshop's interactive sessions with the resources' participants. While many recommendations were specific to data processing, data curation, metadata tools, or variant identifiers, they can be grouped into core recommendations addressing common challenges within the NHGRI resource community. These core recommendations highlight the key themes that emerged across sessions and are listed in the nine recommendations below. Increase transparency to enable effective sharing/reproducibility (documenting, benchmarking, publishing, mapping)Develop entity schema and ontology mapping tools (between models, identifiers, etc.)Annotate tools using resources to increase findability and reuse (Examples: EDAM Ontology of Bioscientific data analysis and data management)Use standard nomenclature and identifiersMake workflows usable by researchers with limited programming expertiseImplement APIs to improve data connectivityPresent data in an interpretable manner, along with machine readabilityDevelop artificial intelligence/machine learning (AI/ML) methods for scaling curation processesAssess the impact of resources using an independent group that can assess return on investment and impact to health and scientific advancement. An additional key collaborative outcome was the development of Appendix A, which outlines ongoing and future efforts, including additional workshops, webinars, and meetings through the listed events provided by the NHGRI resource community. We hope that these activities will enable further advances in the implementation of FAIR standards and continue to foster collaboration and exchange across NHGRI resources and the global community.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ArXiv

自引率

0.00%

发文量