Enformation Theory: A Framework for Evaluating Genomic AI

Eyes S Robson, Nilah M. Ioannidis
{"title":"Enformation Theory: A Framework for Evaluating Genomic AI","authors":"Eyes S Robson, Nilah M. Ioannidis","doi":"10.1101/2024.09.03.611127","DOIUrl":null,"url":null,"abstract":"The nascent field of genomic AI is rapidly expanding with new models, benchmarks, and findings. As the field diversifies, there is an increased need for a common set of measurement tools and perspectives to standardize model evaluation. Here, we present a statistically grounded framework for performance evaluation, visualization, and interpretation using the prominent genomic AI model Enformer as a case study. The Enformer model has been used for a range of applications from mechanism discovery to variant effect prediction, but what makes it better or worse than precedent models at particular tasks? Our goal is not merely to answer these questions for Enformer, but to propose how we should think about new models in general. We start by reporting Enformer's few-shot performance on the GUANinE benchmark, which emphasizes complex genome interpretation tasks, and discuss its gains and deficits compared to precedent models. We follow this analysis with visualizations of Enformer's embeddings in low-dimensional space, where, among other insights, we diagnose features of the embeddings that may limit model generalization to synthetic biology tasks. Finally, we present a novel, theory-backed probe of Enformer embeddings, where variance decomposition allows for holistic interpretation and partial 'backtracking' to explanatory causal features. Through this case study, we illustrate a new framework, Enformation Theory, for analyzing and interpreting genomic AI models.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.03.611127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The nascent field of genomic AI is rapidly expanding with new models, benchmarks, and findings. As the field diversifies, there is an increased need for a common set of measurement tools and perspectives to standardize model evaluation. Here, we present a statistically grounded framework for performance evaluation, visualization, and interpretation using the prominent genomic AI model Enformer as a case study. The Enformer model has been used for a range of applications from mechanism discovery to variant effect prediction, but what makes it better or worse than precedent models at particular tasks? Our goal is not merely to answer these questions for Enformer, but to propose how we should think about new models in general. We start by reporting Enformer's few-shot performance on the GUANinE benchmark, which emphasizes complex genome interpretation tasks, and discuss its gains and deficits compared to precedent models. We follow this analysis with visualizations of Enformer's embeddings in low-dimensional space, where, among other insights, we diagnose features of the embeddings that may limit model generalization to synthetic biology tasks. Finally, we present a novel, theory-backed probe of Enformer embeddings, where variance decomposition allows for holistic interpretation and partial 'backtracking' to explanatory causal features. Through this case study, we illustrate a new framework, Enformation Theory, for analyzing and interpreting genomic AI models.
信息论:评估基因组人工智能的框架
新生的基因组人工智能领域正随着新模型、新基准和新发现的出现而迅速扩展。随着该领域的多样化,越来越需要一套通用的测量工具和视角来规范模型评估。在这里,我们以著名的基因组人工智能模型 Enformer 为案例,介绍了一个基于统计的性能评估、可视化和解释框架。Enformer 模型已被用于从机制发现到变异效应预测等一系列应用,但在特定任务中,它比先例模型好在哪里?我们的目标不仅仅是回答 Enformer 的这些问题,而是提出我们应该如何看待一般的新模型。我们首先报告了 Enformer 在 GUANinE 基准(强调复杂的基因组解读任务)上的少量表现,并讨论了它与先例模型相比的优势和不足。在分析之后,我们对 Enformer 在低维空间中的嵌入进行了可视化展示,其中除其他见解外,我们还诊断了可能限制模型推广到合成生物学任务的嵌入特征。最后,我们提出了一种新颖的、以理论为基础的 Enformer 嵌入探究方法,通过方差分解可以对解释性因果特征进行整体解释和部分 "回溯"。通过这个案例研究,我们展示了一个新的框架--"Enformation 理论",用于分析和解释基因组人工智能模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信