用深度神经网络对显微图像中蛋白质混合模式进行分类

B. Tymchenko, Anhelina Hramatik, Heorhii Tulchiy, S. Antoshchuk, Борис Ігорович Тимченко, Ангеліна Антоліївна Граматік, Георгій Петрович Tульчий, Світлана Григорівна Антощук, Борис Игоревич Тимченко, Ангелина Анатольевна Граматик, Георгий Петрович Тульчий, Светлана Григорьевна Антощук
{"title":"用深度神经网络对显微图像中蛋白质混合模式进行分类","authors":"B. Tymchenko, Anhelina Hramatik, Heorhii Tulchiy, S. Antoshchuk, Борис Ігорович Тимченко, Ангеліна Антоліївна Граматік, Георгій Петрович Tульчий, Світлана Григорівна Антощук, Борис Игоревич Тимченко, Ангелина Анатольевна Граматик, Георгий Петрович Тульчий, Светлана Григорьевна Антощук","doi":"10.15276/hait.01.2019.3","DOIUrl":null,"url":null,"abstract":"Nowadays, accurate diagnosis of diseases, their treatment and prognosis is a very acute problem of modern medicine. By studying information about human proteins, you can identify differentially expressed proteins. These proteins are potentially interesting biomarkers that can be used for an accurate diagnosis, prognosis, or selection of individual treatments, especially for cancer. A surprising finding from this research is that we have relatively few proteins that are tissue specific. Almost half of all proteins are categorized as housekeeping proteins, expressed in all cells. Only 2,300 proteins in the human body have been identified as tissue enriched, meaning they have elevated expression levels in certain tissues. Thanks to advances in high-throughput microscopy, images are generated too quickly for manual evaluation. Consequently, the need for automating the analysis of biomedical images is as great as ever to speed up the understanding of human cells and diseases. Historically, the classification of proteins was limited to individual patterns in one or more cell types, but in order to fully understand the complexity of a human cell, models must classify mixed patterns according to a number of different human cells. The article formulates the problem of image classification in medical research. In this area, classification methods using deep convolutional neural networks are actively used. Presented article gives a brief overview of the various approaches and methods of similar research. As a dataset was taken “The Human Protein Atlas”, that presents a tissue-based map of the human proteome, completed in 2014 after 11 years of research. All protein expression profiling data is publicly accessible in an interactive database, enabling tissue-based exploration of the human proteome. It was done an analysis of the work and the methods that were used during the research. To solve this problem, the deep neural network model is proposed taking into account the characteristics of the domain and the sample under study. The neural network model is based on Inception-v3 architecture. Optimization procedure contains combination of several tweaks for fast convergence: stochastic gradient descent with warm restarts (learning rate schedule for exploring different local minima), progressive image resizing (training starts from small resolution and sequentially increases each cycle of SGDR). We propose new method for threshold selection for F1 measure. Developed model can be used to create an instrument integrated into the medical system of intellectual microscopy to determine the location of the protein from a high-performance image.","PeriodicalId":375628,"journal":{"name":"Herald of Advanced Information Technology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CLASSIFYING MIXED PATTERNS OF PROTEINS IN MICROSCOPIC IMAGES WITH DEEP NEURAL NETWORKS\",\"authors\":\"B. Tymchenko, Anhelina Hramatik, Heorhii Tulchiy, S. Antoshchuk, Борис Ігорович Тимченко, Ангеліна Антоліївна Граматік, Георгій Петрович Tульчий, Світлана Григорівна Антощук, Борис Игоревич Тимченко, Ангелина Анатольевна Граматик, Георгий Петрович Тульчий, Светлана Григорьевна Антощук\",\"doi\":\"10.15276/hait.01.2019.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, accurate diagnosis of diseases, their treatment and prognosis is a very acute problem of modern medicine. By studying information about human proteins, you can identify differentially expressed proteins. These proteins are potentially interesting biomarkers that can be used for an accurate diagnosis, prognosis, or selection of individual treatments, especially for cancer. A surprising finding from this research is that we have relatively few proteins that are tissue specific. Almost half of all proteins are categorized as housekeeping proteins, expressed in all cells. Only 2,300 proteins in the human body have been identified as tissue enriched, meaning they have elevated expression levels in certain tissues. Thanks to advances in high-throughput microscopy, images are generated too quickly for manual evaluation. Consequently, the need for automating the analysis of biomedical images is as great as ever to speed up the understanding of human cells and diseases. Historically, the classification of proteins was limited to individual patterns in one or more cell types, but in order to fully understand the complexity of a human cell, models must classify mixed patterns according to a number of different human cells. The article formulates the problem of image classification in medical research. In this area, classification methods using deep convolutional neural networks are actively used. Presented article gives a brief overview of the various approaches and methods of similar research. As a dataset was taken “The Human Protein Atlas”, that presents a tissue-based map of the human proteome, completed in 2014 after 11 years of research. All protein expression profiling data is publicly accessible in an interactive database, enabling tissue-based exploration of the human proteome. It was done an analysis of the work and the methods that were used during the research. To solve this problem, the deep neural network model is proposed taking into account the characteristics of the domain and the sample under study. The neural network model is based on Inception-v3 architecture. Optimization procedure contains combination of several tweaks for fast convergence: stochastic gradient descent with warm restarts (learning rate schedule for exploring different local minima), progressive image resizing (training starts from small resolution and sequentially increases each cycle of SGDR). We propose new method for threshold selection for F1 measure. Developed model can be used to create an instrument integrated into the medical system of intellectual microscopy to determine the location of the protein from a high-performance image.\",\"PeriodicalId\":375628,\"journal\":{\"name\":\"Herald of Advanced Information Technology\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Herald of Advanced Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15276/hait.01.2019.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Herald of Advanced Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15276/hait.01.2019.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

疾病的准确诊断、治疗和预后是现代医学亟待解决的问题。通过研究人类蛋白质的信息,你可以识别差异表达的蛋白质。这些蛋白质是潜在的有趣的生物标志物,可用于准确诊断、预后或选择个体治疗,特别是癌症。这项研究的一个令人惊讶的发现是,我们只有相对较少的组织特异性蛋白质。几乎一半的蛋白质被归类为管家蛋白,在所有细胞中表达。人体中只有2300种蛋白质被确定为组织富集蛋白,这意味着它们在某些组织中的表达水平升高。由于高通量显微镜技术的进步,图像生成速度太快,无法进行人工评估。因此,自动化生物医学图像分析的需求与以往一样大,以加快对人类细胞和疾病的理解。从历史上看,蛋白质的分类仅限于一种或多种细胞类型中的单个模式,但为了充分了解人类细胞的复杂性,模型必须根据许多不同的人类细胞对混合模式进行分类。阐述了医学研究中的图像分类问题。在这一领域,使用深度卷积神经网络的分类方法被积极使用。本文简要概述了类似研究的各种途径和方法。作为一个数据集,“人类蛋白质图谱”呈现了人类蛋白质组的组织图,经过11年的研究,于2014年完成。所有蛋白质表达谱数据都可以在交互式数据库中公开访问,从而实现基于组织的人类蛋白质组探索。对研究中所做的工作和使用的方法进行了分析。为了解决这一问题,提出了考虑域和待研究样本特征的深度神经网络模型。神经网络模型基于Inception-v3架构。优化过程包含几个快速收敛的调整组合:随机梯度下降与热重启(学习率计划探索不同的局部最小值),渐进图像调整大小(训练从小分辨率开始,依次增加每个SGDR周期)。提出了一种新的F1测度阈值选择方法。开发的模型可用于创建集成到智能显微镜医疗系统的仪器,以从高性能图像中确定蛋白质的位置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CLASSIFYING MIXED PATTERNS OF PROTEINS IN MICROSCOPIC IMAGES WITH DEEP NEURAL NETWORKS
Nowadays, accurate diagnosis of diseases, their treatment and prognosis is a very acute problem of modern medicine. By studying information about human proteins, you can identify differentially expressed proteins. These proteins are potentially interesting biomarkers that can be used for an accurate diagnosis, prognosis, or selection of individual treatments, especially for cancer. A surprising finding from this research is that we have relatively few proteins that are tissue specific. Almost half of all proteins are categorized as housekeeping proteins, expressed in all cells. Only 2,300 proteins in the human body have been identified as tissue enriched, meaning they have elevated expression levels in certain tissues. Thanks to advances in high-throughput microscopy, images are generated too quickly for manual evaluation. Consequently, the need for automating the analysis of biomedical images is as great as ever to speed up the understanding of human cells and diseases. Historically, the classification of proteins was limited to individual patterns in one or more cell types, but in order to fully understand the complexity of a human cell, models must classify mixed patterns according to a number of different human cells. The article formulates the problem of image classification in medical research. In this area, classification methods using deep convolutional neural networks are actively used. Presented article gives a brief overview of the various approaches and methods of similar research. As a dataset was taken “The Human Protein Atlas”, that presents a tissue-based map of the human proteome, completed in 2014 after 11 years of research. All protein expression profiling data is publicly accessible in an interactive database, enabling tissue-based exploration of the human proteome. It was done an analysis of the work and the methods that were used during the research. To solve this problem, the deep neural network model is proposed taking into account the characteristics of the domain and the sample under study. The neural network model is based on Inception-v3 architecture. Optimization procedure contains combination of several tweaks for fast convergence: stochastic gradient descent with warm restarts (learning rate schedule for exploring different local minima), progressive image resizing (training starts from small resolution and sequentially increases each cycle of SGDR). We propose new method for threshold selection for F1 measure. Developed model can be used to create an instrument integrated into the medical system of intellectual microscopy to determine the location of the protein from a high-performance image.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信