{"title":"Classification of helical polymers with deep-learning language models","authors":"Daoyi Li, Wen Jiang","doi":"10.1016/j.jsb.2023.108041","DOIUrl":null,"url":null,"abstract":"<div><p>Many macromolecules in biological systems exist in the form of helical polymers. However, the inherent polymorphism and heterogeneity of samples complicate the reconstruction of helical polymers from cryo-EM images. Currently, available 2D classification methods are effective at separating particles of interest from contaminants, but they do not effectively differentiate between polymorphs, resulting in heterogeneity in the 2D classes. As such, it is crucial to develop a method that can computationally divide a dataset of polymorphic helical structures into homogenous subsets. In this work, we utilized deep-learning language models to embed the filaments as vectors in hyperspace and group them into clusters. Tests with both simulated and experimental datasets have demonstrated that our method – HLM (<strong>H</strong>elical classification with <strong>L</strong>anguage <strong>M</strong>odel) can effectively distinguish different types of filaments, in the presence of many contaminants and low signal-to-noise ratios. We also demonstrate that HLM can isolate homogeneous subsets of particles from a publicly available dataset, resulting in the discovery of a previously unreported filament variant with an extra density around the tau filaments.</p></div>","PeriodicalId":17074,"journal":{"name":"Journal of structural biology","volume":"215 4","pages":"Article 108041"},"PeriodicalIF":3.0000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of structural biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047847723001041","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Many macromolecules in biological systems exist in the form of helical polymers. However, the inherent polymorphism and heterogeneity of samples complicate the reconstruction of helical polymers from cryo-EM images. Currently, available 2D classification methods are effective at separating particles of interest from contaminants, but they do not effectively differentiate between polymorphs, resulting in heterogeneity in the 2D classes. As such, it is crucial to develop a method that can computationally divide a dataset of polymorphic helical structures into homogenous subsets. In this work, we utilized deep-learning language models to embed the filaments as vectors in hyperspace and group them into clusters. Tests with both simulated and experimental datasets have demonstrated that our method – HLM (Helical classification with Language Model) can effectively distinguish different types of filaments, in the presence of many contaminants and low signal-to-noise ratios. We also demonstrate that HLM can isolate homogeneous subsets of particles from a publicly available dataset, resulting in the discovery of a previously unreported filament variant with an extra density around the tau filaments.
期刊介绍:
Journal of Structural Biology (JSB) has an open access mirror journal, the Journal of Structural Biology: X (JSBX), sharing the same aims and scope, editorial team, submission system and rigorous peer review. Since both journals share the same editorial system, you may submit your manuscript via either journal homepage. You will be prompted during submission (and revision) to choose in which to publish your article. The editors and reviewers are not aware of the choice you made until the article has been published online. JSB and JSBX publish papers dealing with the structural analysis of living material at every level of organization by all methods that lead to an understanding of biological function in terms of molecular and supermolecular structure.
Techniques covered include:
• Light microscopy including confocal microscopy
• All types of electron microscopy
• X-ray diffraction
• Nuclear magnetic resonance
• Scanning force microscopy, scanning probe microscopy, and tunneling microscopy
• Digital image processing
• Computational insights into structure