Cluster center genes as candidate biomarkers for the classification of Leukemia

IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications Pub Date : 2014-07-07 DOI:10.1109/IISA.2014.6878769

J. Rosa, A. Magpantay, A. Gonzaga, Geoffrey A. Solano

{"title":"Cluster center genes as candidate biomarkers for the classification of Leukemia","authors":"J. Rosa, A. Magpantay, A. Gonzaga, Geoffrey A. Solano","doi":"10.1109/IISA.2014.6878769","DOIUrl":null,"url":null,"abstract":"Modern technologies such as DNA microarray have been developed to study the transcriptome of cancer cells. It has been used in many studies for tumor classification and of identification of marker genes associated with cancer. However, this technique often suffers the `curse of dimensionality'. A general approach to overcome this setback is to perform feature selection technique prior to classification. Biomarkers have long been used for the prognosis and diagnosis of different types of diseases. The need for new and more specific biomarkers for leukemia arises. In this study gene selection was approached first using gene filtering by determining the expressions inter-quartile ranges (IQR) of the genes and determining whether or not they are differentially expressed across the different sample types by using the Kruskal-Wallis analysis of variance (ANOVA). Filtered genes were then subjected to k-means clustering algorithm to identify candidate genes that can be used to discriminate the four main types of leukemia (ALL, AML, CLL, CML) and non-leukemia (NoL) bone marrow samples. The selected genes were then used to build classification models using Support Vector Machine (SVM) and Artificial Neural Network (ANN) learning algorithms. Forty samples were used to build the models and 20 samples were used to assess the models performance. A minimum of 6 genes was found to be needed to correctly classify all samples in the training dataset into the five categories and to classify the samples in the validation dataset with high accuracy.","PeriodicalId":298835,"journal":{"name":"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2014.6878769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Modern technologies such as DNA microarray have been developed to study the transcriptome of cancer cells. It has been used in many studies for tumor classification and of identification of marker genes associated with cancer. However, this technique often suffers the `curse of dimensionality'. A general approach to overcome this setback is to perform feature selection technique prior to classification. Biomarkers have long been used for the prognosis and diagnosis of different types of diseases. The need for new and more specific biomarkers for leukemia arises. In this study gene selection was approached first using gene filtering by determining the expressions inter-quartile ranges (IQR) of the genes and determining whether or not they are differentially expressed across the different sample types by using the Kruskal-Wallis analysis of variance (ANOVA). Filtered genes were then subjected to k-means clustering algorithm to identify candidate genes that can be used to discriminate the four main types of leukemia (ALL, AML, CLL, CML) and non-leukemia (NoL) bone marrow samples. The selected genes were then used to build classification models using Support Vector Machine (SVM) and Artificial Neural Network (ANN) learning algorithms. Forty samples were used to build the models and 20 samples were used to assess the models performance. A minimum of 6 genes was found to be needed to correctly classify all samples in the training dataset into the five categories and to classify the samples in the validation dataset with high accuracy.

查看原文本刊更多论文

聚类中心基因作为白血病分类的候选生物标志物

现代技术如DNA微阵列已经发展到研究癌细胞的转录组。它已在许多研究中用于肿瘤分类和识别与癌症相关的标记基因。然而，这种技术经常遭受“维度的诅咒”。克服这种挫折的一般方法是在分类之前执行特征选择技术。生物标志物长期以来被用于不同类型疾病的预后和诊断。白血病需要新的和更特异的生物标志物。在这项研究中，基因选择首先通过基因过滤来确定基因的表达四分位数范围(IQR)，并通过Kruskal-Wallis方差分析(ANOVA)来确定它们在不同样本类型中是否存在差异表达。然后对筛选后的基因进行k-means聚类算法，以识别可用于区分四种主要类型的白血病(ALL、AML、CLL、CML)和非白血病(NoL)骨髓样本的候选基因。然后使用支持向量机(SVM)和人工神经网络(ANN)学习算法建立分类模型。用40个样本建立模型，用20个样本评估模型的性能。研究发现，至少需要6个基因才能正确地将训练数据集中的所有样本划分为这五类，并对验证数据集中的样本进行高精度的分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications

自引率

0.00%

发文量