Classification of miRNA Expression Data Using Random Forests for Cancer Diagnosis

E. Razak, F. Yusof, R. Raus
{"title":"Classification of miRNA Expression Data Using Random Forests for Cancer Diagnosis","authors":"E. Razak, F. Yusof, R. Raus","doi":"10.1109/ICCCE.2016.49","DOIUrl":null,"url":null,"abstract":"Cancer is a major leading cause of death and responsible for around 13% of all deaths world-wide. Cancer incidence rate is growing at an alarming rate in Malaysia and the world as we know it. It is estimated that statistically one out of every four Malaysians will develop cancer by the age of 75. Conventional methods of diagnosing cancer rely solely on skilled physicians, with the help of medical imaging, to detect certain symptoms which usually appear in the late stage of cancer. Furthermore, biopsy examinations are highly invasive since tissue samples are required to be extracted from patients. There exist minimally invasive cancer biomarkers in forms of proteins from serum. Nevertheless, existing protein-based diagnosis techniques require labor-intensive analysis compounded by low diagnosis sensitivity. There have indeed been a number of studies to identify novel miRNA-based cancer biomarkers. However, the existing diagnosis techniques using miRNA suffer from low diagnosis accuracy, sensitivity, and specificity. The low diagnosis accuracy and sensitivity of the existing techniques stems from the fact that there is extremely low miRNA count in body fluids. There is also an inevitable problem of cross contamination between cells and exosomes in sample preparation steps. This paper proposes to circumvent these problems in data analysis stage with a machine learning technique called Random Forest. The proposed system achieved 93.48 % accuracy for gastric cancer and 100 % accuracy for ovarian cancer. The results are promising and encouraging. Despite much noise contaminated the sample preparation process and low miRNA count in body fluids, the proposed system able to identify miRNA markers responsible for classification of cancer.","PeriodicalId":360454,"journal":{"name":"2016 International Conference on Computer and Communication Engineering (ICCCE)","volume":"52 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Computer and Communication Engineering (ICCCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCE.2016.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Cancer is a major leading cause of death and responsible for around 13% of all deaths world-wide. Cancer incidence rate is growing at an alarming rate in Malaysia and the world as we know it. It is estimated that statistically one out of every four Malaysians will develop cancer by the age of 75. Conventional methods of diagnosing cancer rely solely on skilled physicians, with the help of medical imaging, to detect certain symptoms which usually appear in the late stage of cancer. Furthermore, biopsy examinations are highly invasive since tissue samples are required to be extracted from patients. There exist minimally invasive cancer biomarkers in forms of proteins from serum. Nevertheless, existing protein-based diagnosis techniques require labor-intensive analysis compounded by low diagnosis sensitivity. There have indeed been a number of studies to identify novel miRNA-based cancer biomarkers. However, the existing diagnosis techniques using miRNA suffer from low diagnosis accuracy, sensitivity, and specificity. The low diagnosis accuracy and sensitivity of the existing techniques stems from the fact that there is extremely low miRNA count in body fluids. There is also an inevitable problem of cross contamination between cells and exosomes in sample preparation steps. This paper proposes to circumvent these problems in data analysis stage with a machine learning technique called Random Forest. The proposed system achieved 93.48 % accuracy for gastric cancer and 100 % accuracy for ovarian cancer. The results are promising and encouraging. Despite much noise contaminated the sample preparation process and low miRNA count in body fluids, the proposed system able to identify miRNA markers responsible for classification of cancer.
用随机森林分类miRNA表达数据用于癌症诊断
癌症是导致死亡的主要原因,约占全球死亡人数的13%。正如我们所知,马来西亚和全世界的癌症发病率正在以惊人的速度增长。据统计,每四个马来西亚人中就有一个会在75岁时患上癌症。传统的癌症诊断方法完全依靠熟练的医生,在医学成像的帮助下,检测通常在癌症晚期出现的某些症状。此外,活检检查是高度侵入性的,因为需要从患者身上提取组织样本。血清中存在以蛋白质形式存在的微创癌症生物标志物。然而,现有的基于蛋白质的诊断技术需要劳动密集型的分析,并且诊断灵敏度低。确实有许多研究确定了新的基于mirna的癌症生物标志物。然而,现有的miRNA诊断技术存在诊断准确性、敏感性和特异性较低的问题。现有技术诊断准确性和敏感性较低的原因是体液中miRNA计数极低。在样品制备步骤中,细胞和外泌体之间也存在不可避免的交叉污染问题。本文提出在数据分析阶段使用随机森林机器学习技术来规避这些问题。该系统对胃癌的准确率为93.48%,对卵巢癌的准确率为100%。结果是令人鼓舞的。尽管样品制备过程中存在大量噪声污染,体液中miRNA计数较低,但该系统能够识别负责癌症分类的miRNA标记物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信