使用深度学习语言模型对螺旋聚合物进行分类。

IF 3 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY
Daoyi Li, Wen Jiang
{"title":"使用深度学习语言模型对螺旋聚合物进行分类。","authors":"Daoyi Li,&nbsp;Wen Jiang","doi":"10.1016/j.jsb.2023.108041","DOIUrl":null,"url":null,"abstract":"<div><p>Many macromolecules in biological systems exist in the form of helical polymers. However, the inherent polymorphism and heterogeneity of samples complicate the reconstruction of helical polymers from cryo-EM images. Currently, available 2D classification methods are effective at separating particles of interest from contaminants, but they do not effectively differentiate between polymorphs, resulting in heterogeneity in the 2D classes. As such, it is crucial to develop a method that can computationally divide a dataset of polymorphic helical structures into homogenous subsets. In this work, we utilized deep-learning language models to embed the filaments as vectors in hyperspace and group them into clusters. Tests with both simulated and experimental datasets have demonstrated that our method – HLM (<strong>H</strong>elical classification with <strong>L</strong>anguage <strong>M</strong>odel) can effectively distinguish different types of filaments, in the presence of many contaminants and low signal-to-noise ratios. We also demonstrate that HLM can isolate homogeneous subsets of particles from a publicly available dataset, resulting in the discovery of a previously unreported filament variant with an extra density around the tau filaments.</p></div>","PeriodicalId":17074,"journal":{"name":"Journal of structural biology","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classification of helical polymers with deep-learning language models\",\"authors\":\"Daoyi Li,&nbsp;Wen Jiang\",\"doi\":\"10.1016/j.jsb.2023.108041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Many macromolecules in biological systems exist in the form of helical polymers. However, the inherent polymorphism and heterogeneity of samples complicate the reconstruction of helical polymers from cryo-EM images. Currently, available 2D classification methods are effective at separating particles of interest from contaminants, but they do not effectively differentiate between polymorphs, resulting in heterogeneity in the 2D classes. As such, it is crucial to develop a method that can computationally divide a dataset of polymorphic helical structures into homogenous subsets. In this work, we utilized deep-learning language models to embed the filaments as vectors in hyperspace and group them into clusters. Tests with both simulated and experimental datasets have demonstrated that our method – HLM (<strong>H</strong>elical classification with <strong>L</strong>anguage <strong>M</strong>odel) can effectively distinguish different types of filaments, in the presence of many contaminants and low signal-to-noise ratios. We also demonstrate that HLM can isolate homogeneous subsets of particles from a publicly available dataset, resulting in the discovery of a previously unreported filament variant with an extra density around the tau filaments.</p></div>\",\"PeriodicalId\":17074,\"journal\":{\"name\":\"Journal of structural biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of structural biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047847723001041\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of structural biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047847723001041","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

生物系统中的许多大分子以螺旋聚合物的形式存在。然而,样品固有的多态性和异质性使从冷冻电镜图像重建螺旋聚合物变得复杂。目前,可用的2D分类方法在将感兴趣的颗粒与污染物分离方面是有效的,但它们不能有效地区分多晶型,导致2D类别中的异质性。因此,开发一种可以通过计算将多态螺旋结构数据集划分为同质子集的方法至关重要。在这项工作中,我们利用深度学习语言模型将细丝作为向量嵌入超空间,并将其分组为簇。模拟和实验数据集的测试表明,在存在许多污染物和低信噪比的情况下,我们的方法HLM(语言模型螺旋分类)可以有效地区分不同类型的细丝。我们还证明,HLM可以从公开的数据集中分离出均匀的粒子子集,从而发现了一种以前未报道的在τ细丝周围具有额外密度的细丝变体。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Classification of helical polymers with deep-learning language models

Classification of helical polymers with deep-learning language models

Many macromolecules in biological systems exist in the form of helical polymers. However, the inherent polymorphism and heterogeneity of samples complicate the reconstruction of helical polymers from cryo-EM images. Currently, available 2D classification methods are effective at separating particles of interest from contaminants, but they do not effectively differentiate between polymorphs, resulting in heterogeneity in the 2D classes. As such, it is crucial to develop a method that can computationally divide a dataset of polymorphic helical structures into homogenous subsets. In this work, we utilized deep-learning language models to embed the filaments as vectors in hyperspace and group them into clusters. Tests with both simulated and experimental datasets have demonstrated that our method – HLM (Helical classification with Language Model) can effectively distinguish different types of filaments, in the presence of many contaminants and low signal-to-noise ratios. We also demonstrate that HLM can isolate homogeneous subsets of particles from a publicly available dataset, resulting in the discovery of a previously unreported filament variant with an extra density around the tau filaments.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of structural biology
Journal of structural biology 生物-生化与分子生物学
CiteScore
6.30
自引率
3.30%
发文量
88
审稿时长
65 days
期刊介绍: Journal of Structural Biology (JSB) has an open access mirror journal, the Journal of Structural Biology: X (JSBX), sharing the same aims and scope, editorial team, submission system and rigorous peer review. Since both journals share the same editorial system, you may submit your manuscript via either journal homepage. You will be prompted during submission (and revision) to choose in which to publish your article. The editors and reviewers are not aware of the choice you made until the article has been published online. JSB and JSBX publish papers dealing with the structural analysis of living material at every level of organization by all methods that lead to an understanding of biological function in terms of molecular and supermolecular structure. Techniques covered include: • Light microscopy including confocal microscopy • All types of electron microscopy • X-ray diffraction • Nuclear magnetic resonance • Scanning force microscopy, scanning probe microscopy, and tunneling microscopy • Digital image processing • Computational insights into structure
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信