{"title":"Uncovering differential tolerance to deletions versus substitutions with a protein language model.","authors":"Grant Goldman, Prathamesh Chati, Vasilis Ntranos","doi":"10.1016/j.cels.2025.101373","DOIUrl":null,"url":null,"abstract":"<p><p>Deep mutational scanning (DMS) experiments have been successfully leveraged to understand genotype to phenotype mapping. However, the overwhelming majority of DMS have focused on amino acid substitutions. Thus, it remains unclear how indels differentially shape the fitness landscape relative to substitutions. To further our understanding of the relationship between substitutions and deletions, we leveraged a protein language model to analyze every single amino acid deletion in the human proteome. We discovered hundreds of thousands of sites that display opposing behavior for deletions versus substitutions: sites that can tolerate being substituted but not deleted or vice versa. We identified secondary structural elements and sequence context to be important mediators of differential tolerance. Our results underscore the value of deletion-substitution comparisons at the genome-wide scale, provide novel insights into how substitutions could systematically differ from deletions, and showcase the power of protein language models to generate biological hypotheses in silico.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"101373"},"PeriodicalIF":7.7000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2025.101373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/5 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep mutational scanning (DMS) experiments have been successfully leveraged to understand genotype to phenotype mapping. However, the overwhelming majority of DMS have focused on amino acid substitutions. Thus, it remains unclear how indels differentially shape the fitness landscape relative to substitutions. To further our understanding of the relationship between substitutions and deletions, we leveraged a protein language model to analyze every single amino acid deletion in the human proteome. We discovered hundreds of thousands of sites that display opposing behavior for deletions versus substitutions: sites that can tolerate being substituted but not deleted or vice versa. We identified secondary structural elements and sequence context to be important mediators of differential tolerance. Our results underscore the value of deletion-substitution comparisons at the genome-wide scale, provide novel insights into how substitutions could systematically differ from deletions, and showcase the power of protein language models to generate biological hypotheses in silico.