稀疏自编码器揭示了蛋白质语言模型表示中的生物可解释特征。

IF 9.1 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Onkar Gujral, Mihir Bafna, Eric Alm, Bonnie Berger
{"title":"稀疏自编码器揭示了蛋白质语言模型表示中的生物可解释特征。","authors":"Onkar Gujral, Mihir Bafna, Eric Alm, Bonnie Berger","doi":"10.1073/pnas.2506316122","DOIUrl":null,"url":null,"abstract":"<p><p>Foundation models in biology-particularly protein language models (PLMs)-have enabled ground-breaking predictions in protein structure, function, and beyond. However, the \"black-box\" nature of these representations limits transparency and explainability, posing challenges for human-AI collaboration and leaving open questions about their human-interpretable features. Here, we leverage sparse autoencoders (SAEs) and a variant, transcoders, from natural language processing to extract, in a completely unsupervised fashion, interpretable sparse features present in both protein-level and amino acid (AA)-level representations from ESM2, a popular PLM. Unlike other approaches such as training probes for features, the extraction of features by the SAE is performed without any supervision. We find that many sparse features extracted from SAEs trained on protein-level representations are tightly associated with Gene Ontology (GO) terms across all levels of the GO hierarchy. We also use Anthropic's Claude to automate the interpretation of sparse features for both protein-level and AA-level representations and find that many of these features correspond to specific protein families and functions such as the NAD Kinase, IUNH, and the PTH family, as well as proteins involved in methyltransferase activity and in olfactory and gustatory sensory perception. We show that sparse features are more interpretable than ESM2 neurons across all our trained SAEs and transcoders. These findings demonstrate that SAEs offer a promising unsupervised approach for disentangling biologically relevant information present in PLM representations, thus aiding interpretability. This work opens the door to safety, trust, and explainability of PLMs and their applications, and paves the way to extracting meaningful biological insights across increasingly powerful models in the life sciences.</p>","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"122 34","pages":"e2506316122"},"PeriodicalIF":9.1000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403088/pdf/","citationCount":"0","resultStr":"{\"title\":\"Sparse autoencoders uncover biologically interpretable features in protein language model representations.\",\"authors\":\"Onkar Gujral, Mihir Bafna, Eric Alm, Bonnie Berger\",\"doi\":\"10.1073/pnas.2506316122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Foundation models in biology-particularly protein language models (PLMs)-have enabled ground-breaking predictions in protein structure, function, and beyond. However, the \\\"black-box\\\" nature of these representations limits transparency and explainability, posing challenges for human-AI collaboration and leaving open questions about their human-interpretable features. Here, we leverage sparse autoencoders (SAEs) and a variant, transcoders, from natural language processing to extract, in a completely unsupervised fashion, interpretable sparse features present in both protein-level and amino acid (AA)-level representations from ESM2, a popular PLM. Unlike other approaches such as training probes for features, the extraction of features by the SAE is performed without any supervision. We find that many sparse features extracted from SAEs trained on protein-level representations are tightly associated with Gene Ontology (GO) terms across all levels of the GO hierarchy. We also use Anthropic's Claude to automate the interpretation of sparse features for both protein-level and AA-level representations and find that many of these features correspond to specific protein families and functions such as the NAD Kinase, IUNH, and the PTH family, as well as proteins involved in methyltransferase activity and in olfactory and gustatory sensory perception. We show that sparse features are more interpretable than ESM2 neurons across all our trained SAEs and transcoders. These findings demonstrate that SAEs offer a promising unsupervised approach for disentangling biologically relevant information present in PLM representations, thus aiding interpretability. This work opens the door to safety, trust, and explainability of PLMs and their applications, and paves the way to extracting meaningful biological insights across increasingly powerful models in the life sciences.</p>\",\"PeriodicalId\":20548,\"journal\":{\"name\":\"Proceedings of the National Academy of Sciences of the United States of America\",\"volume\":\"122 34\",\"pages\":\"e2506316122\"},\"PeriodicalIF\":9.1000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403088/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the National Academy of Sciences of the United States of America\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1073/pnas.2506316122\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2506316122","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

生物学的基础模型——尤其是蛋白质语言模型(PLMs)——已经在蛋白质结构、功能等方面实现了突破性的预测。然而,这些表征的“黑箱”性质限制了透明度和可解释性,给人类与人工智能的合作带来了挑战,并留下了关于人类可解释特征的开放性问题。在这里,我们利用稀疏自编码器(sae)和一种变体,转码器,从自然语言处理中以完全无监督的方式,从ESM2(一种流行的PLM)中提取蛋白质水平和氨基酸(AA)水平表示中存在的可解释的稀疏特征。与其他方法(如特征训练探针)不同,SAE在没有任何监督的情况下进行特征提取。我们发现,从蛋白质水平表示训练的sae中提取的许多稀疏特征与基因本体(GO)的所有层次上的术语紧密相关。我们还使用Anthropic的Claude来自动解释蛋白质水平和aa水平表示的稀疏特征,并发现许多这些特征对应于特定的蛋白质家族和功能,如NAD激酶,IUNH和PTH家族,以及参与甲基转移酶活性和嗅觉和味觉感官知觉的蛋白质。我们发现,在所有训练过的sae和转码器中,稀疏特征比ESM2神经元更具可解释性。这些发现表明,sae提供了一种很有前途的无监督方法来解开PLM表示中存在的生物学相关信息,从而有助于可解释性。这项工作为plm及其应用的安全性、信任度和可解释性打开了大门,并为在生命科学中日益强大的模型中提取有意义的生物学见解铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Sparse autoencoders uncover biologically interpretable features in protein language model representations.

Sparse autoencoders uncover biologically interpretable features in protein language model representations.

Sparse autoencoders uncover biologically interpretable features in protein language model representations.

Sparse autoencoders uncover biologically interpretable features in protein language model representations.

Foundation models in biology-particularly protein language models (PLMs)-have enabled ground-breaking predictions in protein structure, function, and beyond. However, the "black-box" nature of these representations limits transparency and explainability, posing challenges for human-AI collaboration and leaving open questions about their human-interpretable features. Here, we leverage sparse autoencoders (SAEs) and a variant, transcoders, from natural language processing to extract, in a completely unsupervised fashion, interpretable sparse features present in both protein-level and amino acid (AA)-level representations from ESM2, a popular PLM. Unlike other approaches such as training probes for features, the extraction of features by the SAE is performed without any supervision. We find that many sparse features extracted from SAEs trained on protein-level representations are tightly associated with Gene Ontology (GO) terms across all levels of the GO hierarchy. We also use Anthropic's Claude to automate the interpretation of sparse features for both protein-level and AA-level representations and find that many of these features correspond to specific protein families and functions such as the NAD Kinase, IUNH, and the PTH family, as well as proteins involved in methyltransferase activity and in olfactory and gustatory sensory perception. We show that sparse features are more interpretable than ESM2 neurons across all our trained SAEs and transcoders. These findings demonstrate that SAEs offer a promising unsupervised approach for disentangling biologically relevant information present in PLM representations, thus aiding interpretability. This work opens the door to safety, trust, and explainability of PLMs and their applications, and paves the way to extracting meaningful biological insights across increasingly powerful models in the life sciences.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
19.00
自引率
0.90%
发文量
3575
审稿时长
2.5 months
期刊介绍: The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信