{"title":"Single-sample classification of breast cancer tumors using data-driven reference genes","authors":"Burook Misganaw, M. Vidyasagar","doi":"10.1109/INDIANCC.2016.7441098","DOIUrl":null,"url":null,"abstract":"Breast cancer is the most prevalent form of cancer to strike women. There are four major subtypes of breast cancer, and therapies are different for each of these subtypes. In particular, two of the four subtypes are grouped into so-called ER-positive (ER+) subtype and are treated with tamoxifen. Therefore an important first step in personalizing breast cancer therapy is to classify a patient into ER+ or ER- on the basis of the tumor. Due to so-called \"batch effect,\" measurements of gene expression levels of tumors vary from one instrument to the next, or even the same instrument from one day to another. In the present paper, we propose that reference genes should be chosen in a data-driven manner, to consist of those genes that show very little variation across tumors. Our choice of reference genes shows far less variation compared to genes proposed in the literature, known as \"housekeeping\" genes. Using our reference genes to normalize measurements, we trained a binary classifier to discriminate between ER+ and ER- patients using the TCGA Agilent database of 519 samples. This classifier was then tested, one sample at a time, on 759 other tumors and also on the original 519 samples but using a different platform (Affymetrix). The results are extremely satisfactory. Thus our paper is the first to present a method for classifying breast cancer tumors into ER+ and ER-, one tumor at a time, in the presence of the batch effect and platform variation.","PeriodicalId":286356,"journal":{"name":"2016 Indian Control Conference (ICC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIANCC.2016.7441098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Breast cancer is the most prevalent form of cancer to strike women. There are four major subtypes of breast cancer, and therapies are different for each of these subtypes. In particular, two of the four subtypes are grouped into so-called ER-positive (ER+) subtype and are treated with tamoxifen. Therefore an important first step in personalizing breast cancer therapy is to classify a patient into ER+ or ER- on the basis of the tumor. Due to so-called "batch effect," measurements of gene expression levels of tumors vary from one instrument to the next, or even the same instrument from one day to another. In the present paper, we propose that reference genes should be chosen in a data-driven manner, to consist of those genes that show very little variation across tumors. Our choice of reference genes shows far less variation compared to genes proposed in the literature, known as "housekeeping" genes. Using our reference genes to normalize measurements, we trained a binary classifier to discriminate between ER+ and ER- patients using the TCGA Agilent database of 519 samples. This classifier was then tested, one sample at a time, on 759 other tumors and also on the original 519 samples but using a different platform (Affymetrix). The results are extremely satisfactory. Thus our paper is the first to present a method for classifying breast cancer tumors into ER+ and ER-, one tumor at a time, in the presence of the batch effect and platform variation.