Yoshiaki Sota, S. Seno, Y. Takenaka, S. Noguchi, H. Matsuda
{"title":"乳腺癌基因表达谱转化方法的比较分析","authors":"Yoshiaki Sota, S. Seno, Y. Takenaka, S. Noguchi, H. Matsuda","doi":"10.1109/BIBE.2016.51","DOIUrl":null,"url":null,"abstract":"Gene expression profiling has been increasingly used in clinical practice. Integration of expression data across multiple experiments provides better insight into the heterogeneity of the biology being examined. A problem of the data integration, an experimental batch from platform or laboratory sources, remains a barrier to systematically analyzing data across different datasets. Several methods (such as, ComBat) have been proposed to remove batch effects. However, these methods often make assumptions about ideal distribution of the underlying data. Difficulties might be expected when comparing datasets that have fundamentally different (dataset-dependent) distributions. For example, clinical datasets are often collected from patient samples with various disease stages or conditions. Therefore, we have compared several mathematical transformations across many datasets, including the nonparametric Z scaling transformation method (NPZ) we have proposed for clinical use. We selected 2,813 patients with available information on estrogen receptor (ER) status or human epidermal growth factor receptor 2 (HER2) status from 24 Affymetrix HG-U133 (GPL96) or Affymetrix HG-U133 plus 2.0 (GPL570) datasets in the Gene Expression Omnibus database. The microarray expression data were processed with one of the four following methods: Raw (background correction and log transformation only), Microarray Suite 5.0 (MAS5), frozen robust multiarray analysis (fRMA), and radius minimax (RMX). The normalized data were sequentially transformed by using one of the following five methods: untransformed (without transformation), single-array-based transformations (RANK, Z, NPZ, or YuGene). Finally, we compared the ER and HER2 statuses assessed by immunohistochemical (IHC) staining with mRNA expression. We found that single-array-based transformation in addition to normalization improved the concordance rates of the IHC staining. We demonstrated the influence of transformation by using breast cancer samples and showed that adding single-array-based transformations to microarray expression data resulted in stronger correlations with IHC staining.","PeriodicalId":377504,"journal":{"name":"2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"630 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Comparative Analysis of Transformation Methods for Gene Expression Profiles in Breast Cancer Datasets\",\"authors\":\"Yoshiaki Sota, S. Seno, Y. Takenaka, S. Noguchi, H. Matsuda\",\"doi\":\"10.1109/BIBE.2016.51\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gene expression profiling has been increasingly used in clinical practice. Integration of expression data across multiple experiments provides better insight into the heterogeneity of the biology being examined. A problem of the data integration, an experimental batch from platform or laboratory sources, remains a barrier to systematically analyzing data across different datasets. Several methods (such as, ComBat) have been proposed to remove batch effects. However, these methods often make assumptions about ideal distribution of the underlying data. Difficulties might be expected when comparing datasets that have fundamentally different (dataset-dependent) distributions. For example, clinical datasets are often collected from patient samples with various disease stages or conditions. Therefore, we have compared several mathematical transformations across many datasets, including the nonparametric Z scaling transformation method (NPZ) we have proposed for clinical use. We selected 2,813 patients with available information on estrogen receptor (ER) status or human epidermal growth factor receptor 2 (HER2) status from 24 Affymetrix HG-U133 (GPL96) or Affymetrix HG-U133 plus 2.0 (GPL570) datasets in the Gene Expression Omnibus database. The microarray expression data were processed with one of the four following methods: Raw (background correction and log transformation only), Microarray Suite 5.0 (MAS5), frozen robust multiarray analysis (fRMA), and radius minimax (RMX). The normalized data were sequentially transformed by using one of the following five methods: untransformed (without transformation), single-array-based transformations (RANK, Z, NPZ, or YuGene). Finally, we compared the ER and HER2 statuses assessed by immunohistochemical (IHC) staining with mRNA expression. We found that single-array-based transformation in addition to normalization improved the concordance rates of the IHC staining. We demonstrated the influence of transformation by using breast cancer samples and showed that adding single-array-based transformations to microarray expression data resulted in stronger correlations with IHC staining.\",\"PeriodicalId\":377504,\"journal\":{\"name\":\"2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)\",\"volume\":\"630 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2016.51\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2016.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
基因表达谱已越来越多地应用于临床实践。跨多个实验的表达数据的整合提供了更好地了解正在检查的生物学的异质性。数据集成的问题,即来自平台或实验室来源的实验批,仍然是跨不同数据集系统分析数据的障碍。已经提出了几种方法(如ComBat)来移除批处理效果。然而,这些方法通常假设基础数据的理想分布。在比较具有根本不同(依赖于数据集)分布的数据集时,可能会遇到困难。例如,临床数据集通常是从具有不同疾病阶段或状况的患者样本中收集的。因此,我们比较了许多数据集上的几种数学转换,包括我们提出的用于临床的非参数Z缩放转换方法(NPZ)。我们从基因表达综合数据库中的24个Affymetrix HG-U133 (GPL96)或Affymetrix HG-U133 plus 2.0 (GPL570)数据集中选择了2,813例具有雌激素受体(ER)状态或人表皮生长因子受体2 (HER2)状态信息的患者。采用以下四种方法之一处理微阵列表达数据:Raw(仅限背景校正和对数变换)、microarray Suite 5.0 (MAS5)、frozen robust multiarray analysis (fRMA)和radius minimax (RMX)。使用以下五种方法中的一种对规范化数据进行顺序转换:未转换(不转换)、基于单数组的转换(RANK、Z、NPZ或YuGene)。最后,我们比较了免疫组化(IHC)染色和mRNA表达评估的ER和HER2状态。我们发现,除了归一化外,单阵列转化还提高了免疫组化染色的一致性率。我们通过使用乳腺癌样本证明了转化的影响,并表明将基于单阵列的转化添加到微阵列表达数据中与IHC染色具有更强的相关性。
Comparative Analysis of Transformation Methods for Gene Expression Profiles in Breast Cancer Datasets
Gene expression profiling has been increasingly used in clinical practice. Integration of expression data across multiple experiments provides better insight into the heterogeneity of the biology being examined. A problem of the data integration, an experimental batch from platform or laboratory sources, remains a barrier to systematically analyzing data across different datasets. Several methods (such as, ComBat) have been proposed to remove batch effects. However, these methods often make assumptions about ideal distribution of the underlying data. Difficulties might be expected when comparing datasets that have fundamentally different (dataset-dependent) distributions. For example, clinical datasets are often collected from patient samples with various disease stages or conditions. Therefore, we have compared several mathematical transformations across many datasets, including the nonparametric Z scaling transformation method (NPZ) we have proposed for clinical use. We selected 2,813 patients with available information on estrogen receptor (ER) status or human epidermal growth factor receptor 2 (HER2) status from 24 Affymetrix HG-U133 (GPL96) or Affymetrix HG-U133 plus 2.0 (GPL570) datasets in the Gene Expression Omnibus database. The microarray expression data were processed with one of the four following methods: Raw (background correction and log transformation only), Microarray Suite 5.0 (MAS5), frozen robust multiarray analysis (fRMA), and radius minimax (RMX). The normalized data were sequentially transformed by using one of the following five methods: untransformed (without transformation), single-array-based transformations (RANK, Z, NPZ, or YuGene). Finally, we compared the ER and HER2 statuses assessed by immunohistochemical (IHC) staining with mRNA expression. We found that single-array-based transformation in addition to normalization improved the concordance rates of the IHC staining. We demonstrated the influence of transformation by using breast cancer samples and showed that adding single-array-based transformations to microarray expression data resulted in stronger correlations with IHC staining.