{"title":"Analysis and visualization of functional relationships between RNA expression and clinical annotation using PathlinX.","authors":"Scott L Carter","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>We have analyzed a publicly available dataset consisting of gene-expression measurements from 105 lung carcinomas joined with clinical parameters describing the age, smoking history, and survival statistics for the patients that the tumors originated in. Our aim was to demonstrate how the unsupervised analysis technique embodied in PathlinX allows researchers to quickly gain an intuition for the most significant relationships between heterogeneous data elements. A variety of metrics were evaluated empirically by their ability to distinguish biological signal in the data from random noise; this was accomplished by random permutation of the data rows followed by comprehensive pair-wise comparison of all experimental elements. Thresholds of significance were established based on the metric scores for the permuted data. Sub-threshold associations were then removed. The remaining associations were then grouped by a transitive closure process to generate undirected graphs of associations called PathlinX networks. We discuss the various features of each generated PathlinX network and demonstrate the ability of the technique to highlight biological features in large heterogeneous datasets.</p>","PeriodicalId":79712,"journal":{"name":"Proceedings. AMIA Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244258/pdf/procamiasymp00001-0162.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We have analyzed a publicly available dataset consisting of gene-expression measurements from 105 lung carcinomas joined with clinical parameters describing the age, smoking history, and survival statistics for the patients that the tumors originated in. Our aim was to demonstrate how the unsupervised analysis technique embodied in PathlinX allows researchers to quickly gain an intuition for the most significant relationships between heterogeneous data elements. A variety of metrics were evaluated empirically by their ability to distinguish biological signal in the data from random noise; this was accomplished by random permutation of the data rows followed by comprehensive pair-wise comparison of all experimental elements. Thresholds of significance were established based on the metric scores for the permuted data. Sub-threshold associations were then removed. The remaining associations were then grouped by a transitive closure process to generate undirected graphs of associations called PathlinX networks. We discuss the various features of each generated PathlinX network and demonstrate the ability of the technique to highlight biological features in large heterogeneous datasets.