稀疏自编码器揭示了蛋白质语言模型表示中的生物可解释特征。

IF 9.1 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Proceedings of the National Academy of Sciences of the United States of America Pub Date : 2025-08-26 Epub Date: 2025-08-19 DOI:10.1073/pnas.2506316122

Onkar Gujral, Mihir Bafna, Eric Alm, Bonnie Berger

{"title":"稀疏自编码器揭示了蛋白质语言模型表示中的生物可解释特征。","authors":"Onkar Gujral, Mihir Bafna, Eric Alm, Bonnie Berger","doi":"10.1073/pnas.2506316122","DOIUrl":null,"url":null,"abstract":"Foundation models in biology-particularly protein language models (PLMs)-have enabled ground-breaking predictions in protein structure, function, and beyond. However, the \"black-box\" nature of these representations limits transparency and explainability, posing challenges for human-AI collaboration and leaving open questions about their human-interpretable features. Here, we leverage sparse autoencoders (SAEs) and a variant, transcoders, from natural language processing to extract, in a completely unsupervised fashion, interpretable sparse features present in both protein-level and amino acid (AA)-level representations from ESM2, a popular PLM. Unlike other approaches such as training probes for features, the extraction of features by the SAE is performed without any supervision. We find that many sparse features extracted from SAEs trained on protein-level representations are tightly associated with Gene Ontology (GO) terms across all levels of the GO hierarchy. We also use Anthropic's Claude to automate the interpretation of sparse features for both protein-level and AA-level representations and find that many of these features correspond to specific protein families and functions such as the NAD Kinase, IUNH, and the PTH family, as well as proteins involved in methyltransferase activity and in olfactory and gustatory sensory perception. We show that sparse features are more interpretable than ESM2 neurons across all our trained SAEs and transcoders. These findings demonstrate that SAEs offer a promising unsupervised approach for disentangling biologically relevant information present in PLM representations, thus aiding interpretability. This work opens the door to safety, trust, and explainability of PLMs and their applications, and paves the way to extracting meaningful biological insights across increasingly powerful models in the life sciences.","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"122 34","pages":"e2506316122"},"PeriodicalIF":9.1000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403088/pdf/","citationCount":"0","resultStr":"{\"title\":\"Sparse autoencoders uncover biologically interpretable features in protein language model representations.\",\"authors\":\"Onkar Gujral, Mihir Bafna, Eric Alm, Bonnie Berger\",\"doi\":\"10.1073/pnas.2506316122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Foundation models in biology-particularly protein language models (PLMs)-have enabled ground-breaking predictions in protein structure, function, and beyond. However, the \\\"black-box\\\" nature of these representations limits transparency and explainability, posing challenges for human-AI collaboration and leaving open questions about their human-interpretable features. Here, we leverage sparse autoencoders (SAEs) and a variant, transcoders, from natural language processing to extract, in a completely unsupervised fashion, interpretable sparse features present in both protein-level and amino acid (AA)-level representations from ESM2, a popular PLM. Unlike other approaches such as training probes for features, the extraction of features by the SAE is performed without any supervision. We find that many sparse features extracted from SAEs trained on protein-level representations are tightly associated with Gene Ontology (GO) terms across all levels of the GO hierarchy. We also use Anthropic's Claude to automate the interpretation of sparse features for both protein-level and AA-level representations and find that many of these features correspond to specific protein families and functions such as the NAD Kinase, IUNH, and the PTH family, as well as proteins involved in methyltransferase activity and in olfactory and gustatory sensory perception. We show that sparse features are more interpretable than ESM2 neurons across all our trained SAEs and transcoders. These findings demonstrate that SAEs offer a promising unsupervised approach for disentangling biologically relevant information present in PLM representations, thus aiding interpretability. This work opens the door to safety, trust, and explainability of PLMs and their applications, and paves the way to extracting meaningful biological insights across increasingly powerful models in the life sciences.\",\"PeriodicalId\":20548,\"journal\":{\"name\":\"Proceedings of the National Academy of Sciences of the United States of America\",\"volume\":\"122 34\",\"pages\":\"e2506316122\"},\"PeriodicalIF\":9.1000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403088/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the National Academy of Sciences of the United States of America\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1073/pnas.2506316122\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2506316122","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

生物学的基础模型——尤其是蛋白质语言模型（PLMs）——已经在蛋白质结构、功能等方面实现了突破性的预测。然而，这些表征的“黑箱”性质限制了透明度和可解释性，给人类与人工智能的合作带来了挑战，并留下了关于人类可解释特征的开放性问题。在这里，我们利用稀疏自编码器（sae）和一种变体，转码器，从自然语言处理中以完全无监督的方式，从ESM2（一种流行的PLM）中提取蛋白质水平和氨基酸（AA）水平表示中存在的可解释的稀疏特征。与其他方法（如特征训练探针）不同，SAE在没有任何监督的情况下进行特征提取。我们发现，从蛋白质水平表示训练的sae中提取的许多稀疏特征与基因本体（GO）的所有层次上的术语紧密相关。我们还使用Anthropic的Claude来自动解释蛋白质水平和aa水平表示的稀疏特征，并发现许多这些特征对应于特定的蛋白质家族和功能，如NAD激酶，IUNH和PTH家族，以及参与甲基转移酶活性和嗅觉和味觉感官知觉的蛋白质。我们发现，在所有训练过的sae和转码器中，稀疏特征比ESM2神经元更具可解释性。这些发现表明，sae提供了一种很有前途的无监督方法来解开PLM表示中存在的生物学相关信息，从而有助于可解释性。这项工作为plm及其应用的安全性、信任度和可解释性打开了大门，并为在生命科学中日益强大的模型中提取有意义的生物学见解铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Sparse autoencoders uncover biologically interpretable features in protein language model representations.

查看原文本刊更多论文

Sparse autoencoders uncover biologically interpretable features in protein language model representations.

Foundation models in biology-particularly protein language models (PLMs)-have enabled ground-breaking predictions in protein structure, function, and beyond. However, the "black-box" nature of these representations limits transparency and explainability, posing challenges for human-AI collaboration and leaving open questions about their human-interpretable features. Here, we leverage sparse autoencoders (SAEs) and a variant, transcoders, from natural language processing to extract, in a completely unsupervised fashion, interpretable sparse features present in both protein-level and amino acid (AA)-level representations from ESM2, a popular PLM. Unlike other approaches such as training probes for features, the extraction of features by the SAE is performed without any supervision. We find that many sparse features extracted from SAEs trained on protein-level representations are tightly associated with Gene Ontology (GO) terms across all levels of the GO hierarchy. We also use Anthropic's Claude to automate the interpretation of sparse features for both protein-level and AA-level representations and find that many of these features correspond to specific protein families and functions such as the NAD Kinase, IUNH, and the PTH family, as well as proteins involved in methyltransferase activity and in olfactory and gustatory sensory perception. We show that sparse features are more interpretable than ESM2 neurons across all our trained SAEs and transcoders. These findings demonstrate that SAEs offer a promising unsupervised approach for disentangling biologically relevant information present in PLM representations, thus aiding interpretability. This work opens the door to safety, trust, and explainability of PLMs and their applications, and paves the way to extracting meaningful biological insights across increasingly powerful models in the life sciences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the National Academy of Sciences of the United States of America 综合性期刊-综合性期刊

CiteScore

19.00

自引率

0.90%

发文量

3575

审稿时长

2.5 months

期刊介绍： The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.