基于UMAP嵌入和卷积神经网络的多组学数据集成模型。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Cancer Informatics Pub Date : 2022-09-28 eCollection Date: 2022-01-01 DOI:10.1177/11769351221124205
Bashier ElKarami, Abedalrhman Alkhateeb, Hazem Qattous, Lujain Alshomali, Behnam Shahrrava
{"title":"基于UMAP嵌入和卷积神经网络的多组学数据集成模型。","authors":"Bashier ElKarami,&nbsp;Abedalrhman Alkhateeb,&nbsp;Hazem Qattous,&nbsp;Lujain Alshomali,&nbsp;Behnam Shahrrava","doi":"10.1177/11769351221124205","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Multi-omics data integration facilitates collecting richer understanding and perceptions than separate omics data. Various promising integrative approaches have been utilized to analyze multi-omics data for biomedical applications, including disease prediction and disease subtypes, biomarker prediction, and others.</p><p><strong>Methods: </strong>In this paper, we introduce a multi-omics data integration method that is constructed using the combination of gene similarity network (GSN) based on uniform manifold approximation and projection (UMAP) and convolutional neural networks (CNNs). The method utilizes UMAP to embed gene expression, DNA methylation, and copy number alteration (CNA) to a lower dimension creating two-dimensional RGB images. Gene expression is used as a reference to construct the GSN and then integrate other omics data with the gene expression for better prediction. We used CNNs to predict the Gleason score levels of prostate cancer patients and the tumor stage in breast cancer patients.</p><p><strong>Results: </strong>The model proposed near perfection with accuracy above 99% with all other performance measurements at the same level. The proposed model outperformed the state-of-art iSOM-GSN model that constructs the GSN map based on the self-organizing map.</p><p><strong>Conclusion: </strong>The results show that UMAP as an embedding technique can better integrate multi-omics maps into the prediction model than SOM. The proposed model can also be applied to build a multi-omics prediction model for other types of cancer.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":" ","pages":"11769351221124205"},"PeriodicalIF":2.4000,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/0a/5e/10.1177_11769351221124205.PMC9523837.pdf","citationCount":"15","resultStr":"{\"title\":\"Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network.\",\"authors\":\"Bashier ElKarami,&nbsp;Abedalrhman Alkhateeb,&nbsp;Hazem Qattous,&nbsp;Lujain Alshomali,&nbsp;Behnam Shahrrava\",\"doi\":\"10.1177/11769351221124205\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Multi-omics data integration facilitates collecting richer understanding and perceptions than separate omics data. Various promising integrative approaches have been utilized to analyze multi-omics data for biomedical applications, including disease prediction and disease subtypes, biomarker prediction, and others.</p><p><strong>Methods: </strong>In this paper, we introduce a multi-omics data integration method that is constructed using the combination of gene similarity network (GSN) based on uniform manifold approximation and projection (UMAP) and convolutional neural networks (CNNs). The method utilizes UMAP to embed gene expression, DNA methylation, and copy number alteration (CNA) to a lower dimension creating two-dimensional RGB images. Gene expression is used as a reference to construct the GSN and then integrate other omics data with the gene expression for better prediction. We used CNNs to predict the Gleason score levels of prostate cancer patients and the tumor stage in breast cancer patients.</p><p><strong>Results: </strong>The model proposed near perfection with accuracy above 99% with all other performance measurements at the same level. The proposed model outperformed the state-of-art iSOM-GSN model that constructs the GSN map based on the self-organizing map.</p><p><strong>Conclusion: </strong>The results show that UMAP as an embedding technique can better integrate multi-omics maps into the prediction model than SOM. The proposed model can also be applied to build a multi-omics prediction model for other types of cancer.</p>\",\"PeriodicalId\":35418,\"journal\":{\"name\":\"Cancer Informatics\",\"volume\":\" \",\"pages\":\"11769351221124205\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2022-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/0a/5e/10.1177_11769351221124205.PMC9523837.pdf\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/11769351221124205\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11769351221124205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 15

摘要

与单独的组学数据相比,多组学数据集成有助于收集更丰富的理解和感知。各种有前途的综合方法已被用于分析生物医学应用的多组学数据,包括疾病预测和疾病亚型、生物标志物预测等。方法:本文介绍了一种基于均匀流形逼近与投影(UMAP)的基因相似网络(GSN)与卷积神经网络(cnn)相结合构建的多组学数据集成方法。该方法利用UMAP将基因表达、DNA甲基化和拷贝数改变(CNA)嵌入到较低的维度,创建二维RGB图像。以基因表达为参考构建GSN,再将其他组学数据与基因表达相结合,进行更好的预测。我们使用cnn预测前列腺癌患者的Gleason评分水平和乳腺癌患者的肿瘤分期。结果:该模型接近完美,准确率在99%以上,所有其他性能测量都在同一水平。该模型优于现有的基于自组织映射构建GSN映射的om -GSN模型。结论:UMAP作为一种嵌入技术,可以比SOM更好地将多组学图谱整合到预测模型中。该模型也可用于构建其他类型癌症的多组学预测模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network.

Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network.

Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network.

Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network.

Introduction: Multi-omics data integration facilitates collecting richer understanding and perceptions than separate omics data. Various promising integrative approaches have been utilized to analyze multi-omics data for biomedical applications, including disease prediction and disease subtypes, biomarker prediction, and others.

Methods: In this paper, we introduce a multi-omics data integration method that is constructed using the combination of gene similarity network (GSN) based on uniform manifold approximation and projection (UMAP) and convolutional neural networks (CNNs). The method utilizes UMAP to embed gene expression, DNA methylation, and copy number alteration (CNA) to a lower dimension creating two-dimensional RGB images. Gene expression is used as a reference to construct the GSN and then integrate other omics data with the gene expression for better prediction. We used CNNs to predict the Gleason score levels of prostate cancer patients and the tumor stage in breast cancer patients.

Results: The model proposed near perfection with accuracy above 99% with all other performance measurements at the same level. The proposed model outperformed the state-of-art iSOM-GSN model that constructs the GSN map based on the self-organizing map.

Conclusion: The results show that UMAP as an embedding technique can better integrate multi-omics maps into the prediction model than SOM. The proposed model can also be applied to build a multi-omics prediction model for other types of cancer.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cancer Informatics
Cancer Informatics Medicine-Oncology
CiteScore
3.00
自引率
5.00%
发文量
30
审稿时长
8 weeks
期刊介绍: The field of cancer research relies on advances in many other disciplines, including omics technology, mass spectrometry, radio imaging, computer science, and biostatistics. Cancer Informatics provides open access to peer-reviewed high-quality manuscripts reporting bioinformatics analysis of molecular genetics and/or clinical data pertaining to cancer, emphasizing the use of machine learning, artificial intelligence, statistical algorithms, advanced imaging techniques, data visualization, and high-throughput technologies. As the leading journal dedicated exclusively to the report of the use of computational methods in cancer research and practice, Cancer Informatics leverages methodological improvements in systems biology, genomics, proteomics, metabolomics, and molecular biochemistry into the fields of cancer detection, treatment, classification, risk-prediction, prevention, outcome, and modeling.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信