James Strudwick, Laura-Jayne Gardiner, Kate Denning-James, Niina Haiminen, Ashley Evans, Jennifer Kelly, Matthew Madgwick, Filippo Utro, Ed Seabolt, Christopher Gibson, Bharat Bedi, Daniel Clayton, Ciaron Howell, Laxmi Parida, Anna Paola Carrieri
{"title":"AutoXAI4Omics: an automated explainable AI tool for omics and tabular data.","authors":"James Strudwick, Laura-Jayne Gardiner, Kate Denning-James, Niina Haiminen, Ashley Evans, Jennifer Kelly, Matthew Madgwick, Filippo Utro, Ed Seabolt, Christopher Gibson, Bharat Bedi, Daniel Clayton, Ciaron Howell, Laxmi Parida, Anna Paola Carrieri","doi":"10.1093/bib/bbae593","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning (ML) methods offer opportunities for gaining insights into the intricate workings of complex biological systems, and their applications are increasingly prominent in the analysis of omics data to facilitate tasks, such as the identification of novel biomarkers and predictive modeling of phenotypes. For scientists and domain experts, leveraging user-friendly ML pipelines can be incredibly valuable, enabling them to run sophisticated, robust, and interpretable models without requiring in-depth expertise in coding or algorithmic optimization. By streamlining the process of model development and training, researchers can devote their time and energies to the critical tasks of biological interpretation and validation, thereby maximizing the scientific impact of ML-driven insights. Here, we present an entirely automated open-source explainable AI tool, AutoXAI4Omics, that performs classification and regression tasks from omics and tabular numerical data. AutoXAI4Omics accelerates scientific discovery by automating processes and decisions made by AI experts, e.g. selection of the best feature set, hyper-tuning of different ML algorithms and selection of the best ML model for a specific task and dataset. Prior to ML analysis AutoXAI4Omics incorporates feature filtering options that are tailored to specific omic data types. Moreover, the insights into the predictions that are provided by the tool through explainability analysis highlight associations between omic feature values and the targets under investigation, e.g. predicted phenotypes, facilitating the identification of novel actionable insights. AutoXAI4Omics is available at: https://github.com/IBM/AutoXAI4Omics.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583442/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae593","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) methods offer opportunities for gaining insights into the intricate workings of complex biological systems, and their applications are increasingly prominent in the analysis of omics data to facilitate tasks, such as the identification of novel biomarkers and predictive modeling of phenotypes. For scientists and domain experts, leveraging user-friendly ML pipelines can be incredibly valuable, enabling them to run sophisticated, robust, and interpretable models without requiring in-depth expertise in coding or algorithmic optimization. By streamlining the process of model development and training, researchers can devote their time and energies to the critical tasks of biological interpretation and validation, thereby maximizing the scientific impact of ML-driven insights. Here, we present an entirely automated open-source explainable AI tool, AutoXAI4Omics, that performs classification and regression tasks from omics and tabular numerical data. AutoXAI4Omics accelerates scientific discovery by automating processes and decisions made by AI experts, e.g. selection of the best feature set, hyper-tuning of different ML algorithms and selection of the best ML model for a specific task and dataset. Prior to ML analysis AutoXAI4Omics incorporates feature filtering options that are tailored to specific omic data types. Moreover, the insights into the predictions that are provided by the tool through explainability analysis highlight associations between omic feature values and the targets under investigation, e.g. predicted phenotypes, facilitating the identification of novel actionable insights. AutoXAI4Omics is available at: https://github.com/IBM/AutoXAI4Omics.
机器学习(ML)方法为深入了解复杂生物系统的错综复杂的运作提供了机会,其应用在omics 数据分析中的地位日益突出,为新型生物标记物的鉴定和表型预测建模等任务提供了便利。对于科学家和领域专家来说,利用用户友好型 ML 管道具有难以置信的价值,使他们能够运行复杂、强大和可解释的模型,而不需要深入的编码或算法优化专业知识。通过简化模型开发和训练过程,研究人员可以将时间和精力投入到生物学解释和验证的关键任务中,从而最大限度地发挥 ML 驱动的见解的科学影响力。在这里,我们介绍一种完全自动化的开源可解释人工智能工具 AutoXAI4Omics,它可以从 omics 和表格数字数据中执行分类和回归任务。AutoXAI4Omics 可自动执行人工智能专家的流程和决策,例如选择最佳特征集、超调不同的 ML 算法以及为特定任务和数据集选择最佳 ML 模型,从而加速科学发现。在进行 ML 分析之前,AutoXAI4Omics 结合了针对特定 omic 数据类型量身定制的特征过滤选项。此外,该工具通过可解释性分析提供的预测见解突出了 omic 特征值与调查目标(如预测的表型)之间的关联,有助于识别新的可操作见解。AutoXAI4Omics的网址是:https://github.com/IBM/AutoXAI4Omics。
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.