A new statistical approach for the identification of outlier genes in cancer microarray data

Bipul Hossen
{"title":"A new statistical approach for the identification of outlier genes in cancer microarray data","authors":"Bipul Hossen","doi":"10.3329/AJMBR.V6I4.51248","DOIUrl":null,"url":null,"abstract":"The aim of microarrays technology is to discover genes, which are differentially expressed as outliers between two or more groups of patients are an important task in the genomics community. The regular pattern of genes may often breakdown due to the presence of outliers and it is essential to detect those genes whose behavior looks abnormal in experimental and biological conditions. Several statistical techniques - t-statistic, cancer outlier profile analysis (COPA), outlier sums (OS), outlier robust t-statistic (ORT), maximum ordered subset t-statistics (MOST) and least sum of ordered subset square t-statistics (LSOSS) were developed to address the problem of detecting outlier genes in microarray data but these methods are affected by some problems especially if there is an unusual observation in such dataset then the standard assumptions of distribution parameter may be violated and these techniques might not be suitable to detect outliers genes as well. For these consequences, I have developed a new statistical technique that is “Propose t-statistic (PT)”. The performance of the newly proposed method PT statistic compare with the other existing methods applied to the monte carlo simulation data, package data, and real cancer datasets. The result shows that the outlier genes are identified by using the proposed method PT as well and will give the best and identical results than other methods. The performance of the proposed approach significantly improves than the traditional methods and it can extensively contribute to the medical as well as the genomic community. \nAsian J. Med. Biol. Res. December 2020, 6(4): 795-801","PeriodicalId":391187,"journal":{"name":"Asian Journal of Medical and Biological Research","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Journal of Medical and Biological Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3329/AJMBR.V6I4.51248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The aim of microarrays technology is to discover genes, which are differentially expressed as outliers between two or more groups of patients are an important task in the genomics community. The regular pattern of genes may often breakdown due to the presence of outliers and it is essential to detect those genes whose behavior looks abnormal in experimental and biological conditions. Several statistical techniques - t-statistic, cancer outlier profile analysis (COPA), outlier sums (OS), outlier robust t-statistic (ORT), maximum ordered subset t-statistics (MOST) and least sum of ordered subset square t-statistics (LSOSS) were developed to address the problem of detecting outlier genes in microarray data but these methods are affected by some problems especially if there is an unusual observation in such dataset then the standard assumptions of distribution parameter may be violated and these techniques might not be suitable to detect outliers genes as well. For these consequences, I have developed a new statistical technique that is “Propose t-statistic (PT)”. The performance of the newly proposed method PT statistic compare with the other existing methods applied to the monte carlo simulation data, package data, and real cancer datasets. The result shows that the outlier genes are identified by using the proposed method PT as well and will give the best and identical results than other methods. The performance of the proposed approach significantly improves than the traditional methods and it can extensively contribute to the medical as well as the genomic community. Asian J. Med. Biol. Res. December 2020, 6(4): 795-801
癌症微阵列数据中异常基因鉴定的新统计方法
微阵列技术的目的是发现两组或多组患者之间作为异常值的差异表达基因,这是基因组学领域的一项重要任务。由于异常值的存在,基因的正常模式往往会被破坏,因此检测那些在实验和生物学条件下表现异常的基因是至关重要的。几种统计技术- t统计,癌症异常值分析(COPA),异常值和(OS),异常值稳健性t统计(ORT),最大有序子集t统计量(MOST)和最小有序子集平方t统计量(LSOSS)是为了解决微阵列数据中异常基因的检测问题而开发的,但这些方法受到一些问题的影响,特别是如果在此类数据集中存在异常观测值,则可能违反分布参数的标准假设,这些技术也可能不适合检测异常基因。针对这些后果,我开发了一种新的统计技术,即“建议t统计(PT)”。在蒙特卡罗模拟数据、包装数据和真实癌症数据集上,与现有的PT统计方法进行了性能比较。结果表明,该方法能较好地识别出离群基因,并能得到与其他方法相同的最佳结果。与传统方法相比,该方法的性能有了显著提高,可以为医学和基因组界做出广泛贡献。亚洲医学杂志。[j] .科学通报,2016,36 (4):795-801
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信