Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.

IF 4.3 3区医学 Q2 GENETICS & HEREDITY

Human Genomics Pub Date : 2025-02-23 DOI:10.1186/s40246-025-00716-x

Asefa Adimasu Taddese, Assefa Chekole Addis, Bjorn T Tam

{"title":"Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.","authors":"Asefa Adimasu Taddese, Assefa Chekole Addis, Bjorn T Tam","doi":"10.1186/s40246-025-00716-x","DOIUrl":null,"url":null,"abstract":"Background: Researchers have increasingly adopted AI and next-generation sequencing (NGS), revolutionizing genomics and high-throughput screening (HTS), and transforming our understanding of cellular processes and disease mechanisms. However, these advancements generate vast datasets requiring effective data stewardship and curation practices to maintain data integrity, privacy, and accessibility. This review consolidates existing knowledge on key aspects, including data governance, quality management, privacy measures, ownership, access control, accountability, traceability, curation frameworks, and storage systems.Methods: We conducted a systematic literature search up to January 10, 2024, across PubMed, MEDLINE, EMBASE, Scopus, and additional scholarly platforms to examine recent advances and challenges in managing the vast and complex datasets generated by these technologies. Our search strategy employed structured keyword queries focused on four key thematic areas: data governance and management, curation frameworks, algorithmic bias and fairness, and data storage, all within the context of AI applications in genomics and microscopy. Using a realist synthesis methodology, we integrated insights from diverse frameworks to explore the multifaceted challenges associated with data stewardship in these domains. Three independent reviewers, who systematically categorized the information across critical themes, including data governance, quality management, security, privacy, ownership, and access control conducted data extraction and analysis. The study also examined specific AI considerations, such as algorithmic bias, model explainability, and the application of advanced cryptographic techniques. The review process included six stages, starting with an extensive search across multiple research databases, resulting in 273 documents. Screening based on broad criteria, titles, abstracts, and full texts followed this, narrowing the pool to 38 highly relevant citations.Results: Our findings indicated that significant research was conducted in 2023 by highlighting the increasing recognition of robust data governance frameworks in AI-driven genomics and microscopy. While 36 articles extensively discussed data interoperability and sharing, AI-model explain ability and data augmentation remained underexplored, indicating significant gaps. The integration of diverse data types-ranging from sequencing and clinical data to proteomic and imaging data-highlighted the complexity and expansive scope of AI applications in these fields. The current challenges identified in AI-based data stewardship and curation practices are lack of infrastructure and cost optimization, ethical and privacy considerations, access control and sharing mechanisms, large scale data handling and analysis and transparent data-sharing policies and practice. Proposed solutions to address issues related to data quality, privacy, and bias management include advanced cryptographic techniques, federated learning, and blockchain technology. Robust data governance measures, such as GA4GH standards, DUO versioning, and attribute-based access control, are essential for ensuring data integrity, security, and ethical use. The study also emphasized the critical role of Data Management Plans (DMPs), meticulous metadata curation, and advanced cryptographic techniques in mitigating risks related to data security and identifiability. Despite advancements, significant challenges persisted in balancing data ownership with research accessibility, integrating heterogeneous data sources, ensuring platform interoperability, and maintaining data quality. Ongoing risks of unauthorized access and data breaches underscored the need for continuous innovation in data management practices and stricter adherence to legal and ethical standards.Conclusions: These findings explored the current practices and challenges in data stewardship, offering a roadmap for strengthening the governance, security, and ethical use of AI in genomics and microscopy. While robust governance frameworks and ethical practices have established a foundation for data integrity and transparency, there remains an urgent need for collaborative efforts to develop interoperable platforms and transparent data-sharing policies. Additionally, evolving legal and ethical frameworks will be crucial to addressing emerging challenges posed by AI technologies. Fostering transparency, accountability, and ethical responsibility within the research community will be key to ensuring trust and driving ethically sound scientific advancements.","PeriodicalId":13183,"journal":{"name":"Human Genomics","volume":"19 1","pages":"16"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11849233/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40246-025-00716-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Researchers have increasingly adopted AI and next-generation sequencing (NGS), revolutionizing genomics and high-throughput screening (HTS), and transforming our understanding of cellular processes and disease mechanisms. However, these advancements generate vast datasets requiring effective data stewardship and curation practices to maintain data integrity, privacy, and accessibility. This review consolidates existing knowledge on key aspects, including data governance, quality management, privacy measures, ownership, access control, accountability, traceability, curation frameworks, and storage systems.

Methods: We conducted a systematic literature search up to January 10, 2024, across PubMed, MEDLINE, EMBASE, Scopus, and additional scholarly platforms to examine recent advances and challenges in managing the vast and complex datasets generated by these technologies. Our search strategy employed structured keyword queries focused on four key thematic areas: data governance and management, curation frameworks, algorithmic bias and fairness, and data storage, all within the context of AI applications in genomics and microscopy. Using a realist synthesis methodology, we integrated insights from diverse frameworks to explore the multifaceted challenges associated with data stewardship in these domains. Three independent reviewers, who systematically categorized the information across critical themes, including data governance, quality management, security, privacy, ownership, and access control conducted data extraction and analysis. The study also examined specific AI considerations, such as algorithmic bias, model explainability, and the application of advanced cryptographic techniques. The review process included six stages, starting with an extensive search across multiple research databases, resulting in 273 documents. Screening based on broad criteria, titles, abstracts, and full texts followed this, narrowing the pool to 38 highly relevant citations.

Results: Our findings indicated that significant research was conducted in 2023 by highlighting the increasing recognition of robust data governance frameworks in AI-driven genomics and microscopy. While 36 articles extensively discussed data interoperability and sharing, AI-model explain ability and data augmentation remained underexplored, indicating significant gaps. The integration of diverse data types-ranging from sequencing and clinical data to proteomic and imaging data-highlighted the complexity and expansive scope of AI applications in these fields. The current challenges identified in AI-based data stewardship and curation practices are lack of infrastructure and cost optimization, ethical and privacy considerations, access control and sharing mechanisms, large scale data handling and analysis and transparent data-sharing policies and practice. Proposed solutions to address issues related to data quality, privacy, and bias management include advanced cryptographic techniques, federated learning, and blockchain technology. Robust data governance measures, such as GA4GH standards, DUO versioning, and attribute-based access control, are essential for ensuring data integrity, security, and ethical use. The study also emphasized the critical role of Data Management Plans (DMPs), meticulous metadata curation, and advanced cryptographic techniques in mitigating risks related to data security and identifiability. Despite advancements, significant challenges persisted in balancing data ownership with research accessibility, integrating heterogeneous data sources, ensuring platform interoperability, and maintaining data quality. Ongoing risks of unauthorized access and data breaches underscored the need for continuous innovation in data management practices and stricter adherence to legal and ethical standards.

Conclusions: These findings explored the current practices and challenges in data stewardship, offering a roadmap for strengthening the governance, security, and ethical use of AI in genomics and microscopy. While robust governance frameworks and ethical practices have established a foundation for data integrity and transparency, there remains an urgent need for collaborative efforts to develop interoperable platforms and transparent data-sharing policies. Additionally, evolving legal and ethical frameworks will be crucial to addressing emerging challenges posed by AI technologies. Fostering transparency, accountability, and ethical responsibility within the research community will be key to ensuring trust and driving ethically sound scientific advancements.

查看原文本刊更多论文

基于人工智能的基因组学和用于高通量筛选研究的自动显微镜图像分析的数据管理和管理实践：促进稳健和道德的人工智能应用。

研究人员越来越多地采用人工智能和下一代测序（NGS），彻底改变了基因组学和高通量筛选（HTS），并改变了我们对细胞过程和疾病机制的理解。然而，这些进步产生了庞大的数据集，需要有效的数据管理和管理实践来维护数据的完整性、隐私性和可访问性。本综述整合了关键方面的现有知识，包括数据治理、质量管理、隐私措施、所有权、访问控制、问责制、可追溯性、管理框架和存储系统。方法：我们对PubMed、MEDLINE、EMBASE、Scopus和其他学术平台进行了截至2024年1月10日的系统文献检索，以研究这些技术产生的庞大而复杂的数据集管理方面的最新进展和挑战。我们的搜索策略采用结构化关键字查询，重点关注四个关键主题领域：数据治理和管理、策展框架、算法偏差和公平性以及数据存储，所有这些都是在基因组学和显微镜中的人工智能应用背景下进行的。使用现实的综合方法，我们整合了来自不同框架的见解，以探索这些领域中与数据管理相关的多方面挑战。三位独立的审稿人系统地对关键主题的信息进行了分类，包括数据治理、质量管理、安全、隐私、所有权和访问控制，并进行了数据提取和分析。该研究还考察了具体的人工智能考虑因素，如算法偏差、模型可解释性和先进加密技术的应用。审查过程包括六个阶段，首先是在多个研究数据库中进行广泛搜索，结果是273份文件。根据广泛的标准、标题、摘要和全文进行筛选，将范围缩小到38条高度相关的引文。结果：我们的研究结果表明，通过强调人工智能驱动的基因组学和显微镜中对强大数据治理框架的日益认识，我们在2023年进行了重大研究。虽然有36篇文章广泛讨论了数据互操作性和共享，但人工智能模型解释能力和数据增强仍未得到充分探讨，表明存在重大差距。从测序和临床数据到蛋白质组学和成像数据，各种数据类型的整合凸显了人工智能在这些领域应用的复杂性和广泛范围。目前在基于人工智能的数据管理和管理实践中发现的挑战是缺乏基础设施和成本优化、道德和隐私考虑、访问控制和共享机制、大规模数据处理和分析以及透明的数据共享政策和实践。针对与数据质量、隐私和偏见管理相关的问题提出的解决方案包括高级加密技术、联邦学习和区块链技术。健壮的数据治理措施，如GA4GH标准、DUO版本控制和基于属性的访问控制，对于确保数据完整性、安全性和合乎道德的使用至关重要。该研究还强调了数据管理计划（dmp）、细致的元数据管理和先进的加密技术在降低与数据安全性和可识别性相关的风险方面的关键作用。尽管取得了进步，但在平衡数据所有权与研究可及性、整合异构数据源、确保平台互操作性和保持数据质量方面仍然存在重大挑战。持续存在的未经授权访问和数据泄露风险突出表明，需要不断创新数据管理实践，并更严格地遵守法律和道德标准。结论：这些发现探讨了数据管理方面的当前实践和挑战，为加强基因组学和显微镜中人工智能的治理、安全和伦理使用提供了路线图。虽然健全的治理框架和道德实践为数据完整性和透明度奠定了基础，但仍迫切需要合作努力，以开发可互操作的平台和透明的数据共享政策。此外，不断发展的法律和道德框架对于应对人工智能技术带来的新挑战至关重要。促进研究界的透明度、问责制和伦理责任将是确保信任和推动合乎伦理的科学进步的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Human Genomics GENETICS & HEREDITY-

CiteScore

6.00

自引率

2.20%

发文量

审稿时长

11 weeks

期刊介绍： Human Genomics is a peer-reviewed, open access, online journal that focuses on the application of genomic analysis in all aspects of human health and disease, as well as genomic analysis of drug efficacy and safety, and comparative genomics. Topics covered by the journal include, but are not limited to: pharmacogenomics, genome-wide association studies, genome-wide sequencing, exome sequencing, next-generation deep-sequencing, functional genomics, epigenomics, translational genomics, expression profiling, proteomics, bioinformatics, animal models, statistical genetics, genetic epidemiology, human population genetics and comparative genomics.