Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.

IF 3.8 3区 医学 Q2 GENETICS & HEREDITY
Asefa Adimasu Taddese, Assefa Chekole Addis, Bjorn T Tam
{"title":"Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.","authors":"Asefa Adimasu Taddese, Assefa Chekole Addis, Bjorn T Tam","doi":"10.1186/s40246-025-00716-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Researchers have increasingly adopted AI and next-generation sequencing (NGS), revolutionizing genomics and high-throughput screening (HTS), and transforming our understanding of cellular processes and disease mechanisms. However, these advancements generate vast datasets requiring effective data stewardship and curation practices to maintain data integrity, privacy, and accessibility. This review consolidates existing knowledge on key aspects, including data governance, quality management, privacy measures, ownership, access control, accountability, traceability, curation frameworks, and storage systems.</p><p><strong>Methods: </strong>We conducted a systematic literature search up to January 10, 2024, across PubMed, MEDLINE, EMBASE, Scopus, and additional scholarly platforms to examine recent advances and challenges in managing the vast and complex datasets generated by these technologies. Our search strategy employed structured keyword queries focused on four key thematic areas: data governance and management, curation frameworks, algorithmic bias and fairness, and data storage, all within the context of AI applications in genomics and microscopy. Using a realist synthesis methodology, we integrated insights from diverse frameworks to explore the multifaceted challenges associated with data stewardship in these domains. Three independent reviewers, who systematically categorized the information across critical themes, including data governance, quality management, security, privacy, ownership, and access control conducted data extraction and analysis. The study also examined specific AI considerations, such as algorithmic bias, model explainability, and the application of advanced cryptographic techniques. The review process included six stages, starting with an extensive search across multiple research databases, resulting in 273 documents. Screening based on broad criteria, titles, abstracts, and full texts followed this, narrowing the pool to 38 highly relevant citations.</p><p><strong>Results: </strong>Our findings indicated that significant research was conducted in 2023 by highlighting the increasing recognition of robust data governance frameworks in AI-driven genomics and microscopy. While 36 articles extensively discussed data interoperability and sharing, AI-model explain ability and data augmentation remained underexplored, indicating significant gaps. The integration of diverse data types-ranging from sequencing and clinical data to proteomic and imaging data-highlighted the complexity and expansive scope of AI applications in these fields. The current challenges identified in AI-based data stewardship and curation practices are lack of infrastructure and cost optimization, ethical and privacy considerations, access control and sharing mechanisms, large scale data handling and analysis and transparent data-sharing policies and practice. Proposed solutions to address issues related to data quality, privacy, and bias management include advanced cryptographic techniques, federated learning, and blockchain technology. Robust data governance measures, such as GA4GH standards, DUO versioning, and attribute-based access control, are essential for ensuring data integrity, security, and ethical use. The study also emphasized the critical role of Data Management Plans (DMPs), meticulous metadata curation, and advanced cryptographic techniques in mitigating risks related to data security and identifiability. Despite advancements, significant challenges persisted in balancing data ownership with research accessibility, integrating heterogeneous data sources, ensuring platform interoperability, and maintaining data quality. Ongoing risks of unauthorized access and data breaches underscored the need for continuous innovation in data management practices and stricter adherence to legal and ethical standards.</p><p><strong>Conclusions: </strong>These findings explored the current practices and challenges in data stewardship, offering a roadmap for strengthening the governance, security, and ethical use of AI in genomics and microscopy. While robust governance frameworks and ethical practices have established a foundation for data integrity and transparency, there remains an urgent need for collaborative efforts to develop interoperable platforms and transparent data-sharing policies. Additionally, evolving legal and ethical frameworks will be crucial to addressing emerging challenges posed by AI technologies. Fostering transparency, accountability, and ethical responsibility within the research community will be key to ensuring trust and driving ethically sound scientific advancements.</p>","PeriodicalId":13183,"journal":{"name":"Human Genomics","volume":"19 1","pages":"16"},"PeriodicalIF":3.8000,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40246-025-00716-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Researchers have increasingly adopted AI and next-generation sequencing (NGS), revolutionizing genomics and high-throughput screening (HTS), and transforming our understanding of cellular processes and disease mechanisms. However, these advancements generate vast datasets requiring effective data stewardship and curation practices to maintain data integrity, privacy, and accessibility. This review consolidates existing knowledge on key aspects, including data governance, quality management, privacy measures, ownership, access control, accountability, traceability, curation frameworks, and storage systems.

Methods: We conducted a systematic literature search up to January 10, 2024, across PubMed, MEDLINE, EMBASE, Scopus, and additional scholarly platforms to examine recent advances and challenges in managing the vast and complex datasets generated by these technologies. Our search strategy employed structured keyword queries focused on four key thematic areas: data governance and management, curation frameworks, algorithmic bias and fairness, and data storage, all within the context of AI applications in genomics and microscopy. Using a realist synthesis methodology, we integrated insights from diverse frameworks to explore the multifaceted challenges associated with data stewardship in these domains. Three independent reviewers, who systematically categorized the information across critical themes, including data governance, quality management, security, privacy, ownership, and access control conducted data extraction and analysis. The study also examined specific AI considerations, such as algorithmic bias, model explainability, and the application of advanced cryptographic techniques. The review process included six stages, starting with an extensive search across multiple research databases, resulting in 273 documents. Screening based on broad criteria, titles, abstracts, and full texts followed this, narrowing the pool to 38 highly relevant citations.

Results: Our findings indicated that significant research was conducted in 2023 by highlighting the increasing recognition of robust data governance frameworks in AI-driven genomics and microscopy. While 36 articles extensively discussed data interoperability and sharing, AI-model explain ability and data augmentation remained underexplored, indicating significant gaps. The integration of diverse data types-ranging from sequencing and clinical data to proteomic and imaging data-highlighted the complexity and expansive scope of AI applications in these fields. The current challenges identified in AI-based data stewardship and curation practices are lack of infrastructure and cost optimization, ethical and privacy considerations, access control and sharing mechanisms, large scale data handling and analysis and transparent data-sharing policies and practice. Proposed solutions to address issues related to data quality, privacy, and bias management include advanced cryptographic techniques, federated learning, and blockchain technology. Robust data governance measures, such as GA4GH standards, DUO versioning, and attribute-based access control, are essential for ensuring data integrity, security, and ethical use. The study also emphasized the critical role of Data Management Plans (DMPs), meticulous metadata curation, and advanced cryptographic techniques in mitigating risks related to data security and identifiability. Despite advancements, significant challenges persisted in balancing data ownership with research accessibility, integrating heterogeneous data sources, ensuring platform interoperability, and maintaining data quality. Ongoing risks of unauthorized access and data breaches underscored the need for continuous innovation in data management practices and stricter adherence to legal and ethical standards.

Conclusions: These findings explored the current practices and challenges in data stewardship, offering a roadmap for strengthening the governance, security, and ethical use of AI in genomics and microscopy. While robust governance frameworks and ethical practices have established a foundation for data integrity and transparency, there remains an urgent need for collaborative efforts to develop interoperable platforms and transparent data-sharing policies. Additionally, evolving legal and ethical frameworks will be crucial to addressing emerging challenges posed by AI technologies. Fostering transparency, accountability, and ethical responsibility within the research community will be key to ensuring trust and driving ethically sound scientific advancements.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Human Genomics
Human Genomics GENETICS & HEREDITY-
CiteScore
6.00
自引率
2.20%
发文量
55
审稿时长
11 weeks
期刊介绍: Human Genomics is a peer-reviewed, open access, online journal that focuses on the application of genomic analysis in all aspects of human health and disease, as well as genomic analysis of drug efficacy and safety, and comparative genomics. Topics covered by the journal include, but are not limited to: pharmacogenomics, genome-wide association studies, genome-wide sequencing, exome sequencing, next-generation deep-sequencing, functional genomics, epigenomics, translational genomics, expression profiling, proteomics, bioinformatics, animal models, statistical genetics, genetic epidemiology, human population genetics and comparative genomics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信