Single-sample classification of breast cancer tumors using data-driven reference genes

Burook Misganaw, M. Vidyasagar
{"title":"Single-sample classification of breast cancer tumors using data-driven reference genes","authors":"Burook Misganaw, M. Vidyasagar","doi":"10.1109/INDIANCC.2016.7441098","DOIUrl":null,"url":null,"abstract":"Breast cancer is the most prevalent form of cancer to strike women. There are four major subtypes of breast cancer, and therapies are different for each of these subtypes. In particular, two of the four subtypes are grouped into so-called ER-positive (ER+) subtype and are treated with tamoxifen. Therefore an important first step in personalizing breast cancer therapy is to classify a patient into ER+ or ER- on the basis of the tumor. Due to so-called \"batch effect,\" measurements of gene expression levels of tumors vary from one instrument to the next, or even the same instrument from one day to another. In the present paper, we propose that reference genes should be chosen in a data-driven manner, to consist of those genes that show very little variation across tumors. Our choice of reference genes shows far less variation compared to genes proposed in the literature, known as \"housekeeping\" genes. Using our reference genes to normalize measurements, we trained a binary classifier to discriminate between ER+ and ER- patients using the TCGA Agilent database of 519 samples. This classifier was then tested, one sample at a time, on 759 other tumors and also on the original 519 samples but using a different platform (Affymetrix). The results are extremely satisfactory. Thus our paper is the first to present a method for classifying breast cancer tumors into ER+ and ER-, one tumor at a time, in the presence of the batch effect and platform variation.","PeriodicalId":286356,"journal":{"name":"2016 Indian Control Conference (ICC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIANCC.2016.7441098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Breast cancer is the most prevalent form of cancer to strike women. There are four major subtypes of breast cancer, and therapies are different for each of these subtypes. In particular, two of the four subtypes are grouped into so-called ER-positive (ER+) subtype and are treated with tamoxifen. Therefore an important first step in personalizing breast cancer therapy is to classify a patient into ER+ or ER- on the basis of the tumor. Due to so-called "batch effect," measurements of gene expression levels of tumors vary from one instrument to the next, or even the same instrument from one day to another. In the present paper, we propose that reference genes should be chosen in a data-driven manner, to consist of those genes that show very little variation across tumors. Our choice of reference genes shows far less variation compared to genes proposed in the literature, known as "housekeeping" genes. Using our reference genes to normalize measurements, we trained a binary classifier to discriminate between ER+ and ER- patients using the TCGA Agilent database of 519 samples. This classifier was then tested, one sample at a time, on 759 other tumors and also on the original 519 samples but using a different platform (Affymetrix). The results are extremely satisfactory. Thus our paper is the first to present a method for classifying breast cancer tumors into ER+ and ER-, one tumor at a time, in the presence of the batch effect and platform variation.
利用数据驱动的参考基因对乳腺癌肿瘤进行单样本分类
乳腺癌是女性中最常见的癌症。乳腺癌有四种主要亚型,每种亚型的治疗方法都不同。特别是,四种亚型中的两种被归为所谓的ER阳性(ER+)亚型,并使用他莫昔芬治疗。因此,个体化乳腺癌治疗的重要第一步是根据肿瘤将患者分为ER+或ER-。由于所谓的“批量效应”,肿瘤基因表达水平的测量在不同仪器之间有所不同,甚至同一仪器在一天到另一天之间也有所不同。在本文中,我们建议以数据驱动的方式选择参考基因,包括那些在肿瘤中表现出很少变化的基因。与文献中提出的“内参”基因相比,我们选择的内参基因的差异要小得多。使用我们的参考基因来标准化测量,我们使用TCGA Agilent数据库的519个样本训练了一个二元分类器来区分ER+和ER-患者。然后,使用不同的平台(Affymetrix),对759个其他肿瘤和最初的519个样本进行了一次一个样本的分类器测试。结果非常令人满意。因此,本文首次提出了在存在批效应和平台变异的情况下,将乳腺癌肿瘤分为ER+和ER-的方法,每次一个肿瘤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信