A new statistical approach for the identification of outlier genes in cancer microarray data

Asian Journal of Medical and Biological Research Pub Date : 2021-01-07 DOI:10.3329/AJMBR.V6I4.51248

Bipul Hossen

{"title":"A new statistical approach for the identification of outlier genes in cancer microarray data","authors":"Bipul Hossen","doi":"10.3329/AJMBR.V6I4.51248","DOIUrl":null,"url":null,"abstract":"The aim of microarrays technology is to discover genes, which are differentially expressed as outliers between two or more groups of patients are an important task in the genomics community. The regular pattern of genes may often breakdown due to the presence of outliers and it is essential to detect those genes whose behavior looks abnormal in experimental and biological conditions. Several statistical techniques - t-statistic, cancer outlier profile analysis (COPA), outlier sums (OS), outlier robust t-statistic (ORT), maximum ordered subset t-statistics (MOST) and least sum of ordered subset square t-statistics (LSOSS) were developed to address the problem of detecting outlier genes in microarray data but these methods are affected by some problems especially if there is an unusual observation in such dataset then the standard assumptions of distribution parameter may be violated and these techniques might not be suitable to detect outliers genes as well. For these consequences, I have developed a new statistical technique that is “Propose t-statistic (PT)”. The performance of the newly proposed method PT statistic compare with the other existing methods applied to the monte carlo simulation data, package data, and real cancer datasets. The result shows that the outlier genes are identified by using the proposed method PT as well and will give the best and identical results than other methods. The performance of the proposed approach significantly improves than the traditional methods and it can extensively contribute to the medical as well as the genomic community. \nAsian J. Med. Biol. Res. December 2020, 6(4): 795-801","PeriodicalId":391187,"journal":{"name":"Asian Journal of Medical and Biological Research","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Journal of Medical and Biological Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3329/AJMBR.V6I4.51248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The aim of microarrays technology is to discover genes, which are differentially expressed as outliers between two or more groups of patients are an important task in the genomics community. The regular pattern of genes may often breakdown due to the presence of outliers and it is essential to detect those genes whose behavior looks abnormal in experimental and biological conditions. Several statistical techniques - t-statistic, cancer outlier profile analysis (COPA), outlier sums (OS), outlier robust t-statistic (ORT), maximum ordered subset t-statistics (MOST) and least sum of ordered subset square t-statistics (LSOSS) were developed to address the problem of detecting outlier genes in microarray data but these methods are affected by some problems especially if there is an unusual observation in such dataset then the standard assumptions of distribution parameter may be violated and these techniques might not be suitable to detect outliers genes as well. For these consequences, I have developed a new statistical technique that is “Propose t-statistic (PT)”. The performance of the newly proposed method PT statistic compare with the other existing methods applied to the monte carlo simulation data, package data, and real cancer datasets. The result shows that the outlier genes are identified by using the proposed method PT as well and will give the best and identical results than other methods. The performance of the proposed approach significantly improves than the traditional methods and it can extensively contribute to the medical as well as the genomic community. Asian J. Med. Biol. Res. December 2020, 6(4): 795-801

查看原文本刊更多论文

癌症微阵列数据中异常基因鉴定的新统计方法

微阵列技术的目的是发现两组或多组患者之间作为异常值的差异表达基因，这是基因组学领域的一项重要任务。由于异常值的存在，基因的正常模式往往会被破坏，因此检测那些在实验和生物学条件下表现异常的基因是至关重要的。几种统计技术- t统计，癌症异常值分析(COPA)，异常值和(OS)，异常值稳健性t统计(ORT)，最大有序子集t统计量(MOST)和最小有序子集平方t统计量(LSOSS)是为了解决微阵列数据中异常基因的检测问题而开发的，但这些方法受到一些问题的影响，特别是如果在此类数据集中存在异常观测值，则可能违反分布参数的标准假设，这些技术也可能不适合检测异常基因。针对这些后果，我开发了一种新的统计技术，即“建议t统计(PT)”。在蒙特卡罗模拟数据、包装数据和真实癌症数据集上，与现有的PT统计方法进行了性能比较。结果表明，该方法能较好地识别出离群基因，并能得到与其他方法相同的最佳结果。与传统方法相比，该方法的性能有了显著提高，可以为医学和基因组界做出广泛贡献。亚洲医学杂志。[j] .科学通报，2016,36 (4):795-801

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Asian Journal of Medical and Biological Research

自引率

0.00%

发文量