Microarray Gene Expression Data for Detection Alzheimer’s Disease Using k-means and Deep Learning

2021 7th International Engineering Conference “Research & Innovation amid Global Pandemic" (IEC) Pub Date : 2021-02-24 DOI:10.1109/IEC52205.2021.9476128

Heba M. AL-Bermany, Sura Z. Al-Rashid

{"title":"Microarray Gene Expression Data for Detection Alzheimer’s Disease Using k-means and Deep Learning","authors":"Heba M. AL-Bermany, Sura Z. Al-Rashid","doi":"10.1109/IEC52205.2021.9476128","DOIUrl":null,"url":null,"abstract":"Microarray technology is a novel method to monitor the expression levels of an enormous number of genes simultaneous. These gene expressions are being used to detect various forms of diseases. The problem is not all genes are important; some genes can be redundant or irrelevant. These irrelevant genes add a computational workload to the prediction process. Therefore, this study aims at (1) identifying the most important genes that cause of Alzheimer’s Disease (AD) by using feature (gene) selection to reduce the high-dimensional data size. Hence, a process for gene selection is twofold; removing the irrelevant genes and selecting the informative genes, and (2) predicting AD patients based on the selected subset of genes. In this paper, gene selection methods have been implemented, including Analysis of Variance (ANOVA) and Mutual Information (MI). In addition to, the k-means algorithm as a gene selection has been suggested. It is also presumed that the relevant genes have been existed in a same cluster, while the insignificant genes are really not belonging to the any cluster. The proposed system is applied on a high dimensional dataset namely AD dataset that contains 16382 genes. After picking the informative genes, prediction is performed with Convolutional Neural Network (CNN) that is commonly used in multiple prediction tasks. The proposed system performance was evaluated using the accuracy of the prediction. The results show that k-means clustering based gene selection can be performed to produce subset of key genes. The k-means algorithm with CNN model returns 0.929 accuracy based on genes subset from ANOVA method while k-means algorithm and CNN model achieve 0.886 accuracy based on genes subset from MI method. Thus, Genes subset selected is achieved a better accuracy at prediction and a little time of processing.","PeriodicalId":374702,"journal":{"name":"2021 7th International Engineering Conference “Research & Innovation amid Global Pandemic\" (IEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Engineering Conference “Research & Innovation amid Global Pandemic\" (IEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEC52205.2021.9476128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Microarray technology is a novel method to monitor the expression levels of an enormous number of genes simultaneous. These gene expressions are being used to detect various forms of diseases. The problem is not all genes are important; some genes can be redundant or irrelevant. These irrelevant genes add a computational workload to the prediction process. Therefore, this study aims at (1) identifying the most important genes that cause of Alzheimer’s Disease (AD) by using feature (gene) selection to reduce the high-dimensional data size. Hence, a process for gene selection is twofold; removing the irrelevant genes and selecting the informative genes, and (2) predicting AD patients based on the selected subset of genes. In this paper, gene selection methods have been implemented, including Analysis of Variance (ANOVA) and Mutual Information (MI). In addition to, the k-means algorithm as a gene selection has been suggested. It is also presumed that the relevant genes have been existed in a same cluster, while the insignificant genes are really not belonging to the any cluster. The proposed system is applied on a high dimensional dataset namely AD dataset that contains 16382 genes. After picking the informative genes, prediction is performed with Convolutional Neural Network (CNN) that is commonly used in multiple prediction tasks. The proposed system performance was evaluated using the accuracy of the prediction. The results show that k-means clustering based gene selection can be performed to produce subset of key genes. The k-means algorithm with CNN model returns 0.929 accuracy based on genes subset from ANOVA method while k-means algorithm and CNN model achieve 0.886 accuracy based on genes subset from MI method. Thus, Genes subset selected is achieved a better accuracy at prediction and a little time of processing.

查看原文本刊更多论文

使用k-均值和深度学习检测阿尔茨海默病的微阵列基因表达数据

微阵列技术是一种同时监测大量基因表达水平的新方法。这些基因表达被用来检测各种形式的疾病。问题是并不是所有的基因都很重要;有些基因可能是多余的或不相关的。这些不相关的基因增加了预测过程的计算工作量。因此，本研究旨在(1)通过特征(基因)选择来减少高维数据的大小，从而识别导致阿尔茨海默病(AD)的最重要基因。因此，基因选择的过程是双重的;去除不相关基因，选择有信息的基因;(2)根据所选择的基因子集预测AD患者。本文采用了方差分析(ANOVA)和互信息分析(MI)等基因选择方法。此外，也有人提出将k-means算法作为一种基因选择。并假定相关基因已经存在于同一簇中，而无关紧要的基因实际上不属于任何簇。该系统应用于包含16382个基因的高维数据集AD数据集。在选择信息基因后，使用卷积神经网络(CNN)进行预测，卷积神经网络通常用于多种预测任务。利用预测的准确性对系统性能进行了评价。结果表明，基于k均值聚类的基因选择可以产生关键基因子集。基于方差分析方法的基因子集，k-means算法与CNN模型的准确率为0.929;基于MI方法的基因子集，k-means算法与CNN模型的准确率为0.886。因此，所选择的基因子集具有较高的预测精度和较少的处理时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 7th International Engineering Conference “Research & Innovation amid Global Pandemic" (IEC)

自引率

0.00%

发文量