Pujan Joshi, Honglin Wang, B. Basso, S. Hong, C. Giardina, Dong-Guk Shin
{"title":"基于路径的基因表达数据通路分析框架","authors":"Pujan Joshi, Honglin Wang, B. Basso, S. Hong, C. Giardina, Dong-Guk Shin","doi":"10.1145/3449258.3449262","DOIUrl":null,"url":null,"abstract":"Pathway analysis is a key step in genomics study to reduce the data complexity and associate prior biological knowledge. Over representation analysis (ORA), Functional class scoring (FCS), and Topology based (TB) analysis are considered as three generations of pathway analysis techniques. These methods only detect the differential activity of an entire pathway, thereby ignoring the importance of routes and sections within the pathway. A novel route-based pathway analysis framework, Route based Pathway Analysis in Cohorts (rPAC), is discussed in this paper which uses pathway topology in true sense by identifying and scoring individual routes within pathways. Activity scores and p-values are calculated for all signaling and effector routes from KEGG signaling pathways with transcriptomics data from each sample in the given cohort. Overall route activity in a cohort is assessed in terms of two summary metrics, “Proportion of Significance” (PS) and “Average Route Score” (ARS). A systematic evaluation based on large number of simulated data showed rPAC significantly outperforming the traditional pathway analysis methods. Case studies of three epithelial cancers from The Cancer Genome Atlas (TCGA) repository revealed that some pathway routes (e.g., tight junction, Th17 cell differentiation, adipocytokine signaling etc.) can notably differentiate cancer types, while other pathway routes that are related to lipid metabolism and adipocytes metabolism are co-regulated in different cancers. While most of the findings are corroborated by the current understanding of cancer biology, many previously uncharacterized mechanisms were identified by rPAC analysis, exhibiting the potential to yield new insights into cancer phenotypes.","PeriodicalId":278216,"journal":{"name":"Proceedings of the 2020 4th International Conference on Computational Biology and Bioinformatics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Framework for Route Based Pathway Analysis of Gene Expression Data\",\"authors\":\"Pujan Joshi, Honglin Wang, B. Basso, S. Hong, C. Giardina, Dong-Guk Shin\",\"doi\":\"10.1145/3449258.3449262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pathway analysis is a key step in genomics study to reduce the data complexity and associate prior biological knowledge. Over representation analysis (ORA), Functional class scoring (FCS), and Topology based (TB) analysis are considered as three generations of pathway analysis techniques. These methods only detect the differential activity of an entire pathway, thereby ignoring the importance of routes and sections within the pathway. A novel route-based pathway analysis framework, Route based Pathway Analysis in Cohorts (rPAC), is discussed in this paper which uses pathway topology in true sense by identifying and scoring individual routes within pathways. Activity scores and p-values are calculated for all signaling and effector routes from KEGG signaling pathways with transcriptomics data from each sample in the given cohort. Overall route activity in a cohort is assessed in terms of two summary metrics, “Proportion of Significance” (PS) and “Average Route Score” (ARS). A systematic evaluation based on large number of simulated data showed rPAC significantly outperforming the traditional pathway analysis methods. Case studies of three epithelial cancers from The Cancer Genome Atlas (TCGA) repository revealed that some pathway routes (e.g., tight junction, Th17 cell differentiation, adipocytokine signaling etc.) can notably differentiate cancer types, while other pathway routes that are related to lipid metabolism and adipocytes metabolism are co-regulated in different cancers. While most of the findings are corroborated by the current understanding of cancer biology, many previously uncharacterized mechanisms were identified by rPAC analysis, exhibiting the potential to yield new insights into cancer phenotypes.\",\"PeriodicalId\":278216,\"journal\":{\"name\":\"Proceedings of the 2020 4th International Conference on Computational Biology and Bioinformatics\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 4th International Conference on Computational Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3449258.3449262\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 4th International Conference on Computational Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3449258.3449262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
途径分析是基因组学研究中降低数据复杂性和关联先验生物学知识的关键步骤。超表征分析(ORA)、功能类评分(FCS)和基于拓扑(TB)的分析被认为是三代路径分析技术。这些方法只检测整个通路的差异活动,从而忽略了通路内的路线和部分的重要性。本文讨论了一种新的基于路由的路径分析框架——基于路由的队列路径分析(rPAC),该框架通过识别和评分路径中的单个路径,真正意义上使用了路径拓扑。根据给定队列中每个样本的转录组学数据,计算KEGG信号通路中所有信号通路和效应通路的活动分数和p值。队列中的总体路线活动是根据两个综合指标来评估的,“显著性比例”(PS)和“平均路线得分”(ARS)。基于大量模拟数据的系统评价表明,rPAC显著优于传统的路径分析方法。来自The Cancer Genome Atlas (TCGA) repository的三种上皮性癌症的案例研究表明,一些通路(如紧密连接、Th17细胞分化、脂肪细胞因子信号传导等)可以显著区分癌症类型,而其他与脂质代谢和脂肪细胞代谢相关的通路在不同的癌症中是共同调控的。虽然目前对癌症生物学的理解证实了大多数发现,但通过rPAC分析确定了许多以前未表征的机制,显示出对癌症表型产生新见解的潜力。
A Framework for Route Based Pathway Analysis of Gene Expression Data
Pathway analysis is a key step in genomics study to reduce the data complexity and associate prior biological knowledge. Over representation analysis (ORA), Functional class scoring (FCS), and Topology based (TB) analysis are considered as three generations of pathway analysis techniques. These methods only detect the differential activity of an entire pathway, thereby ignoring the importance of routes and sections within the pathway. A novel route-based pathway analysis framework, Route based Pathway Analysis in Cohorts (rPAC), is discussed in this paper which uses pathway topology in true sense by identifying and scoring individual routes within pathways. Activity scores and p-values are calculated for all signaling and effector routes from KEGG signaling pathways with transcriptomics data from each sample in the given cohort. Overall route activity in a cohort is assessed in terms of two summary metrics, “Proportion of Significance” (PS) and “Average Route Score” (ARS). A systematic evaluation based on large number of simulated data showed rPAC significantly outperforming the traditional pathway analysis methods. Case studies of three epithelial cancers from The Cancer Genome Atlas (TCGA) repository revealed that some pathway routes (e.g., tight junction, Th17 cell differentiation, adipocytokine signaling etc.) can notably differentiate cancer types, while other pathway routes that are related to lipid metabolism and adipocytes metabolism are co-regulated in different cancers. While most of the findings are corroborated by the current understanding of cancer biology, many previously uncharacterized mechanisms were identified by rPAC analysis, exhibiting the potential to yield new insights into cancer phenotypes.