Overlapping group screening for binary cancer classification with TCGA high-dimensional genomic data.

IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Jie-Huei Wang, Yi-Hau Chen
{"title":"Overlapping group screening for binary cancer classification with TCGA high-dimensional genomic data.","authors":"Jie-Huei Wang,&nbsp;Yi-Hau Chen","doi":"10.1142/S0219720023500130","DOIUrl":null,"url":null,"abstract":"<p><p>Precision medicine has been a global trend of medical development, wherein cancer diagnosis plays an important role. With accurate diagnosis of cancer, we can provide patients with appropriate medical treatments for improving patients' survival. Since disease developments involve complex interplay among multiple factors such as gene-gene interactions, cancer classifications based on microarray gene expression profiling data are expected to be effective, and hence, have attracted extensive attention in computational biology and medicine. However, when using genomic data to build a diagnostic model, there exist several problems to be overcome, including the high-dimensional feature space and feature contamination. In this paper, we propose using the overlapping group screening (OGS) approach to build an accurate cancer diagnosis model and predict the probability of a patient falling into some disease classification category in the logistic regression framework. This new proposal integrates gene pathway information into the procedure for identifying genes and gene-gene interactions associated with the classification of cancer outcome groups. We conduct a series of simulation studies to compare the predictive accuracy of our proposed method for cancer diagnosis with some existing machine learning methods, and find the better performances of the former method. We apply the proposed method to the genomic data of The Cancer Genome Atlas related to lung adenocarcinoma (LUAD), liver hepatocellular carcinoma (LHC), and thyroid carcinoma (THCA), to establish accurate cancer diagnosis models.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Bioinformatics and Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1142/S0219720023500130","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Precision medicine has been a global trend of medical development, wherein cancer diagnosis plays an important role. With accurate diagnosis of cancer, we can provide patients with appropriate medical treatments for improving patients' survival. Since disease developments involve complex interplay among multiple factors such as gene-gene interactions, cancer classifications based on microarray gene expression profiling data are expected to be effective, and hence, have attracted extensive attention in computational biology and medicine. However, when using genomic data to build a diagnostic model, there exist several problems to be overcome, including the high-dimensional feature space and feature contamination. In this paper, we propose using the overlapping group screening (OGS) approach to build an accurate cancer diagnosis model and predict the probability of a patient falling into some disease classification category in the logistic regression framework. This new proposal integrates gene pathway information into the procedure for identifying genes and gene-gene interactions associated with the classification of cancer outcome groups. We conduct a series of simulation studies to compare the predictive accuracy of our proposed method for cancer diagnosis with some existing machine learning methods, and find the better performances of the former method. We apply the proposed method to the genomic data of The Cancer Genome Atlas related to lung adenocarcinoma (LUAD), liver hepatocellular carcinoma (LHC), and thyroid carcinoma (THCA), to establish accurate cancer diagnosis models.

利用TCGA高维基因组数据筛选重叠组进行二元肿瘤分类。
精准医疗已成为全球医学发展的趋势,其中癌症诊断发挥着重要作用。通过对癌症的准确诊断,我们可以为患者提供适当的药物治疗,提高患者的生存率。由于疾病的发展涉及多种因素之间复杂的相互作用,如基因相互作用,基于微阵列基因表达谱数据的癌症分类预计是有效的,因此在计算生物学和医学中引起了广泛的关注。然而,在利用基因组数据构建诊断模型时,存在高维特征空间和特征污染等问题。在本文中,我们提出使用重叠组筛选(OGS)方法来建立准确的癌症诊断模型,并在逻辑回归框架中预测患者属于某种疾病分类类别的概率。这项新建议将基因通路信息整合到识别与癌症结果组分类相关的基因和基因-基因相互作用的程序中。我们进行了一系列的仿真研究,将我们提出的癌症诊断方法的预测精度与现有的一些机器学习方法进行比较,发现前者的性能更好。我们将该方法应用于与肺腺癌(LUAD)、肝细胞癌(LHC)和甲状腺癌(THCA)相关的癌症基因组图谱的基因组数据,建立准确的癌症诊断模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Bioinformatics and Computational Biology
Journal of Bioinformatics and Computational Biology MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
2.10
自引率
0.00%
发文量
57
期刊介绍: The Journal of Bioinformatics and Computational Biology aims to publish high quality, original research articles, expository tutorial papers and review papers as well as short, critical comments on technical issues associated with the analysis of cellular information. The research papers will be technical presentations of new assertions, discoveries and tools, intended for a narrower specialist community. The tutorials, reviews and critical commentary will be targeted at a broader readership of biologists who are interested in using computers but are not knowledgeable about scientific computing, and equally, computer scientists who have an interest in biology but are not familiar with current thrusts nor the language of biology. Such carefully chosen tutorials and articles should greatly accelerate the rate of entry of these new creative scientists into the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信