Segmentation-based Feature Extraction for Cryo-Electron Microscopy at Medium Resolution

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics Pub Date : 2020-09-21 DOI:10.1145/3388440.3414711

Lin Chen, Ruba Jebril, K. Al Nasr

{"title":"Segmentation-based Feature Extraction for Cryo-Electron Microscopy at Medium Resolution","authors":"Lin Chen, Ruba Jebril, K. Al Nasr","doi":"10.1145/3388440.3414711","DOIUrl":null,"url":null,"abstract":"Cryo-Electron Microscopy is a biophysics technique that produces volume images for a given molecule. It can visualize large molecules and protein complexes. At high resolution, <5Å, the structure can be modeled. When the resolution drops to worse than 5Å, computational techniques are used overcome the inaccuracy inherent in volume images. In this paper, we propose a segmentation-based approach to extract important features to overcome the essential inaccuracy in medium resolution volume images. The features are volume components represent local peak regions on the image. Later, the volume components are classified into one of the main secondary structure elements found in the protein molecules. Specifically, we built four models to classify volume components: Helix-Sheet-Loop, Helix-Binary, Sheet-Binary, and Loop-Binary. We used machine learning-based classifiers. Seven classification models are used to classify volume components. The proposed work in this paper is a preliminary approach to detect secondary structure elements from medium resolution volume images. The four machine-learning models were trained using authentic volume images from the Electron Microscopy Data Bank. No simulated/synthesized image was used for either training or testing. This is important since all existing methods use simulated images for training. Due to the noise essential to authentic images, simulated images are not best representatives. The procedure includes feature extraction, model selection, fine-tuning, and model ensembling. We tested our four models on the 20% of the dataset of 3400 volume components. The methods have achieved 80% accuracy for Sheet-Binary model, 77% for Helix-Binary, 71% for Loop-Binary and 67% for Helix-Sheet-Loop model.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3414711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Cryo-Electron Microscopy is a biophysics technique that produces volume images for a given molecule. It can visualize large molecules and protein complexes. At high resolution, <5Å, the structure can be modeled. When the resolution drops to worse than 5Å, computational techniques are used overcome the inaccuracy inherent in volume images. In this paper, we propose a segmentation-based approach to extract important features to overcome the essential inaccuracy in medium resolution volume images. The features are volume components represent local peak regions on the image. Later, the volume components are classified into one of the main secondary structure elements found in the protein molecules. Specifically, we built four models to classify volume components: Helix-Sheet-Loop, Helix-Binary, Sheet-Binary, and Loop-Binary. We used machine learning-based classifiers. Seven classification models are used to classify volume components. The proposed work in this paper is a preliminary approach to detect secondary structure elements from medium resolution volume images. The four machine-learning models were trained using authentic volume images from the Electron Microscopy Data Bank. No simulated/synthesized image was used for either training or testing. This is important since all existing methods use simulated images for training. Due to the noise essential to authentic images, simulated images are not best representatives. The procedure includes feature extraction, model selection, fine-tuning, and model ensembling. We tested our four models on the 20% of the dataset of 3400 volume components. The methods have achieved 80% accuracy for Sheet-Binary model, 77% for Helix-Binary, 71% for Loop-Binary and 67% for Helix-Sheet-Loop model.

查看原文本刊更多论文

基于分割的中分辨率冷冻电镜特征提取

低温电子显微镜是一种生物物理学技术，可以产生给定分子的体积图像。它可以可视化大分子和蛋白质复合物。在高分辨率(<5Å)下，可以对结构进行建模。当分辨率降至5Å以下时，采用计算技术克服体积图像固有的不准确性。在本文中，我们提出了一种基于分割的方法来提取重要特征，以克服中分辨率体图像的本质不准确性。特征是代表图像上局部峰值区域的体积分量。后来，体积组分被归类为蛋白质分子中发现的主要二级结构元素之一。具体来说，我们建立了四个模型来对体积分量进行分类:螺旋-片-环、螺旋-二进制、片-二进制和环-二进制。我们使用了基于机器学习的分类器。采用7种分类模型对体积分量进行分类。本文提出的工作是从中分辨率体图像中检测二级结构元素的初步方法。这四个机器学习模型使用来自电子显微镜数据库的真实体图像进行训练。没有模拟/合成图像用于训练或测试。这一点很重要，因为所有现有的方法都使用模拟图像进行训练。由于真实图像所必需的噪声，模拟图像并不是最好的代表。该过程包括特征提取、模型选择、微调和模型集成。我们在3400个体积分量的数据集的20%上测试了我们的四个模型。该方法的精度分别为:Sheet-Binary模型的80%、Helix-Binary模型的77%、Loop-Binary模型的71%和Helix-Sheet-Loop模型的67%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

自引率

0.00%

发文量