Kenneth D. Carr , Dane Evan D. Zambrano , Connor Weidle , Alex Goodson , Helen E. Eisenach , Harley Pyles , Alexis Courbet , Neil P. King , Andrew J. Borst
{"title":"Protein identification using Cryo-EM and artificial intelligence guides improved sample purification","authors":"Kenneth D. Carr , Dane Evan D. Zambrano , Connor Weidle , Alex Goodson , Helen E. Eisenach , Harley Pyles , Alexis Courbet , Neil P. King , Andrew J. Borst","doi":"10.1016/j.yjsbx.2025.100120","DOIUrl":null,"url":null,"abstract":"<div><div>Protein purification is essential in protein biochemistry, structural biology, and protein design, enabling the determination of protein structures, the study of biological mechanisms, and the characterization of both natural and de novo designed proteins. However, standard purification strategies often encounter challenges, such as unintended co-purification of contaminants alongside the target protein. This issue is particularly problematic for self-assembling protein nanomaterials, where unexpected geometries may reflect novel assembly states, cross-contamination, or native proteins originating from the expression host. Here, we used an automated structure-to-sequence pipeline to first identify an unknown co-purifying protein found in several purified designed protein samples. By integrating cryo-electron microscopy (Cryo-EM), ModelAngelo’s sequence-agnostic model-building, and Protein BLAST, we identified the contaminant as dihydrolipoamide succinyltransferase (DLST). This identification was validated through comparisons with DLST structures in the Protein Data Bank, AlphaFold 3 predictions based on the DLST sequence from our E. coli expression vector, and traditional biochemical methods. The identification informed subsequent modifications to our purification protocol, which successfully excluded DLST from future preparations. To explore the potential broader utility of this approach, we benchmarked four computational methods for DLST identification across varying resolution ranges. This study demonstrates the successful application of a structure-to-sequence protein identification workflow, integrating Cryo-EM, ModelAngelo, Protein BLAST, and AlphaFold 3 predictions, to identify and ultimately help guide the<!--> <!-->removal of DLST from sample purification efforts. It highlights the potential of combining Cryo-EM with AI-driven tools for accurate protein identification and addressing purification challenges across diverse contexts in protein science.</div></div>","PeriodicalId":17238,"journal":{"name":"Journal of Structural Biology: X","volume":"11 ","pages":"Article 100120"},"PeriodicalIF":3.5000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Structural Biology: X","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590152425000017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Protein purification is essential in protein biochemistry, structural biology, and protein design, enabling the determination of protein structures, the study of biological mechanisms, and the characterization of both natural and de novo designed proteins. However, standard purification strategies often encounter challenges, such as unintended co-purification of contaminants alongside the target protein. This issue is particularly problematic for self-assembling protein nanomaterials, where unexpected geometries may reflect novel assembly states, cross-contamination, or native proteins originating from the expression host. Here, we used an automated structure-to-sequence pipeline to first identify an unknown co-purifying protein found in several purified designed protein samples. By integrating cryo-electron microscopy (Cryo-EM), ModelAngelo’s sequence-agnostic model-building, and Protein BLAST, we identified the contaminant as dihydrolipoamide succinyltransferase (DLST). This identification was validated through comparisons with DLST structures in the Protein Data Bank, AlphaFold 3 predictions based on the DLST sequence from our E. coli expression vector, and traditional biochemical methods. The identification informed subsequent modifications to our purification protocol, which successfully excluded DLST from future preparations. To explore the potential broader utility of this approach, we benchmarked four computational methods for DLST identification across varying resolution ranges. This study demonstrates the successful application of a structure-to-sequence protein identification workflow, integrating Cryo-EM, ModelAngelo, Protein BLAST, and AlphaFold 3 predictions, to identify and ultimately help guide the removal of DLST from sample purification efforts. It highlights the potential of combining Cryo-EM with AI-driven tools for accurate protein identification and addressing purification challenges across diverse contexts in protein science.
蛋白质纯化在蛋白质生物化学、结构生物学和蛋白质设计中至关重要,可以确定蛋白质结构,研究生物机制,以及表征天然和从头设计的蛋白质。然而,标准的纯化策略经常遇到挑战,例如污染物与目标蛋白的意外共纯化。这个问题对于自组装蛋白质纳米材料来说尤其成问题,因为意想不到的几何形状可能反映了新的组装状态、交叉污染或来自表达宿主的天然蛋白质。在这里,我们使用自动结构到序列管道首先鉴定了在几个纯化设计的蛋白质样品中发现的未知共纯化蛋白。通过结合低温电子显微镜(Cryo-EM)、ModelAngelo的序列不可知模型构建和Protein BLAST,我们确定了污染物为二氢脂酰胺琥珀基转移酶(DLST)。通过与Protein Data Bank中的DLST结构、基于大肠杆菌表达载体DLST序列的AlphaFold 3预测以及传统生化方法进行比较,验证了这一鉴定。该鉴定提示了我们对纯化方案的后续修改,成功地将DLST排除在未来的制备中。为了探索这种方法的潜在更广泛的效用,我们对不同分辨率范围内DLST识别的四种计算方法进行了基准测试。本研究展示了结构到序列的蛋白质鉴定工作流程的成功应用,整合了Cryo-EM, ModelAngelo, protein BLAST和AlphaFold 3预测,以识别并最终帮助指导从样品纯化工作中去除DLST。它强调了将Cryo-EM与人工智能驱动的工具相结合的潜力,可以准确地识别蛋白质,并解决蛋白质科学中不同背景下的纯化挑战。