Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.
Asefa Adimasu Taddese, Assefa Chekole Addis, Bjorn T Tam
{"title":"Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.","authors":"Asefa Adimasu Taddese, Assefa Chekole Addis, Bjorn T Tam","doi":"10.1186/s40246-025-00716-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Researchers have increasingly adopted AI and next-generation sequencing (NGS), revolutionizing genomics and high-throughput screening (HTS), and transforming our understanding of cellular processes and disease mechanisms. However, these advancements generate vast datasets requiring effective data stewardship and curation practices to maintain data integrity, privacy, and accessibility. This review consolidates existing knowledge on key aspects, including data governance, quality management, privacy measures, ownership, access control, accountability, traceability, curation frameworks, and storage systems.</p><p><strong>Methods: </strong>We conducted a systematic literature search up to January 10, 2024, across PubMed, MEDLINE, EMBASE, Scopus, and additional scholarly platforms to examine recent advances and challenges in managing the vast and complex datasets generated by these technologies. Our search strategy employed structured keyword queries focused on four key thematic areas: data governance and management, curation frameworks, algorithmic bias and fairness, and data storage, all within the context of AI applications in genomics and microscopy. Using a realist synthesis methodology, we integrated insights from diverse frameworks to explore the multifaceted challenges associated with data stewardship in these domains. Three independent reviewers, who systematically categorized the information across critical themes, including data governance, quality management, security, privacy, ownership, and access control conducted data extraction and analysis. The study also examined specific AI considerations, such as algorithmic bias, model explainability, and the application of advanced cryptographic techniques. The review process included six stages, starting with an extensive search across multiple research databases, resulting in 273 documents. Screening based on broad criteria, titles, abstracts, and full texts followed this, narrowing the pool to 38 highly relevant citations.</p><p><strong>Results: </strong>Our findings indicated that significant research was conducted in 2023 by highlighting the increasing recognition of robust data governance frameworks in AI-driven genomics and microscopy. While 36 articles extensively discussed data interoperability and sharing, AI-model explain ability and data augmentation remained underexplored, indicating significant gaps. The integration of diverse data types-ranging from sequencing and clinical data to proteomic and imaging data-highlighted the complexity and expansive scope of AI applications in these fields. The current challenges identified in AI-based data stewardship and curation practices are lack of infrastructure and cost optimization, ethical and privacy considerations, access control and sharing mechanisms, large scale data handling and analysis and transparent data-sharing policies and practice. Proposed solutions to address issues related to data quality, privacy, and bias management include advanced cryptographic techniques, federated learning, and blockchain technology. Robust data governance measures, such as GA4GH standards, DUO versioning, and attribute-based access control, are essential for ensuring data integrity, security, and ethical use. The study also emphasized the critical role of Data Management Plans (DMPs), meticulous metadata curation, and advanced cryptographic techniques in mitigating risks related to data security and identifiability. Despite advancements, significant challenges persisted in balancing data ownership with research accessibility, integrating heterogeneous data sources, ensuring platform interoperability, and maintaining data quality. Ongoing risks of unauthorized access and data breaches underscored the need for continuous innovation in data management practices and stricter adherence to legal and ethical standards.</p><p><strong>Conclusions: </strong>These findings explored the current practices and challenges in data stewardship, offering a roadmap for strengthening the governance, security, and ethical use of AI in genomics and microscopy. While robust governance frameworks and ethical practices have established a foundation for data integrity and transparency, there remains an urgent need for collaborative efforts to develop interoperable platforms and transparent data-sharing policies. Additionally, evolving legal and ethical frameworks will be crucial to addressing emerging challenges posed by AI technologies. Fostering transparency, accountability, and ethical responsibility within the research community will be key to ensuring trust and driving ethically sound scientific advancements.</p>","PeriodicalId":13183,"journal":{"name":"Human Genomics","volume":"19 1","pages":"16"},"PeriodicalIF":3.8000,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40246-025-00716-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Researchers have increasingly adopted AI and next-generation sequencing (NGS), revolutionizing genomics and high-throughput screening (HTS), and transforming our understanding of cellular processes and disease mechanisms. However, these advancements generate vast datasets requiring effective data stewardship and curation practices to maintain data integrity, privacy, and accessibility. This review consolidates existing knowledge on key aspects, including data governance, quality management, privacy measures, ownership, access control, accountability, traceability, curation frameworks, and storage systems.
Methods: We conducted a systematic literature search up to January 10, 2024, across PubMed, MEDLINE, EMBASE, Scopus, and additional scholarly platforms to examine recent advances and challenges in managing the vast and complex datasets generated by these technologies. Our search strategy employed structured keyword queries focused on four key thematic areas: data governance and management, curation frameworks, algorithmic bias and fairness, and data storage, all within the context of AI applications in genomics and microscopy. Using a realist synthesis methodology, we integrated insights from diverse frameworks to explore the multifaceted challenges associated with data stewardship in these domains. Three independent reviewers, who systematically categorized the information across critical themes, including data governance, quality management, security, privacy, ownership, and access control conducted data extraction and analysis. The study also examined specific AI considerations, such as algorithmic bias, model explainability, and the application of advanced cryptographic techniques. The review process included six stages, starting with an extensive search across multiple research databases, resulting in 273 documents. Screening based on broad criteria, titles, abstracts, and full texts followed this, narrowing the pool to 38 highly relevant citations.
Results: Our findings indicated that significant research was conducted in 2023 by highlighting the increasing recognition of robust data governance frameworks in AI-driven genomics and microscopy. While 36 articles extensively discussed data interoperability and sharing, AI-model explain ability and data augmentation remained underexplored, indicating significant gaps. The integration of diverse data types-ranging from sequencing and clinical data to proteomic and imaging data-highlighted the complexity and expansive scope of AI applications in these fields. The current challenges identified in AI-based data stewardship and curation practices are lack of infrastructure and cost optimization, ethical and privacy considerations, access control and sharing mechanisms, large scale data handling and analysis and transparent data-sharing policies and practice. Proposed solutions to address issues related to data quality, privacy, and bias management include advanced cryptographic techniques, federated learning, and blockchain technology. Robust data governance measures, such as GA4GH standards, DUO versioning, and attribute-based access control, are essential for ensuring data integrity, security, and ethical use. The study also emphasized the critical role of Data Management Plans (DMPs), meticulous metadata curation, and advanced cryptographic techniques in mitigating risks related to data security and identifiability. Despite advancements, significant challenges persisted in balancing data ownership with research accessibility, integrating heterogeneous data sources, ensuring platform interoperability, and maintaining data quality. Ongoing risks of unauthorized access and data breaches underscored the need for continuous innovation in data management practices and stricter adherence to legal and ethical standards.
Conclusions: These findings explored the current practices and challenges in data stewardship, offering a roadmap for strengthening the governance, security, and ethical use of AI in genomics and microscopy. While robust governance frameworks and ethical practices have established a foundation for data integrity and transparency, there remains an urgent need for collaborative efforts to develop interoperable platforms and transparent data-sharing policies. Additionally, evolving legal and ethical frameworks will be crucial to addressing emerging challenges posed by AI technologies. Fostering transparency, accountability, and ethical responsibility within the research community will be key to ensuring trust and driving ethically sound scientific advancements.
期刊介绍:
Human Genomics is a peer-reviewed, open access, online journal that focuses on the application of genomic analysis in all aspects of human health and disease, as well as genomic analysis of drug efficacy and safety, and comparative genomics.
Topics covered by the journal include, but are not limited to: pharmacogenomics, genome-wide association studies, genome-wide sequencing, exome sequencing, next-generation deep-sequencing, functional genomics, epigenomics, translational genomics, expression profiling, proteomics, bioinformatics, animal models, statistical genetics, genetic epidemiology, human population genetics and comparative genomics.