{"title":"Prediction of gene expression in human using rat in vivo gene expression in Japanese Toxicogenomics Project","authors":"Martin Otava, Z. Shkedy, Adetayo S Kasim","doi":"10.4161/sysb.29412","DOIUrl":"https://doi.org/10.4161/sysb.29412","url":null,"abstract":"The Japanese Toxicogenomics Project (TGP) provides large amount of data for the toxicology and safety framework. We focus on gene expression data of rat in vivo and human in vitro. We consider two different analyses for the TGP data. The first analysis is based on two-way analysis of variance model and the goal is to detect genes with significant dose-response relationship in both humans and rats. The second analysis consists of a trend analysis at each time point and the goal is to detect genes in the rat in order to predict gene expression in humans. The first analysis leads us to conclusions about the heterogeneity of the compound set and will suggest how to address this issue to improve future analyses. In the second part, we identify, for particular compounds, groups of genes that are translatable from rats to humans, so they can be used for prediction of human in vitro data based on rat in vivo data.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"2 1","pages":"15 - 8"},"PeriodicalIF":0.0,"publicationDate":"2014-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.29412","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rotem Ben-Hamo, S. Boué, F. Martin, M. Talikka, S. Efroni
{"title":"Classification of lung adenocarcinoma and squamous cell carcinoma samples based on their gene expression profile in the sbv IMPROVER Diagnostic Signature Challenge","authors":"Rotem Ben-Hamo, S. Boué, F. Martin, M. Talikka, S. Efroni","doi":"10.4161/sysb.25983","DOIUrl":"https://doi.org/10.4161/sysb.25983","url":null,"abstract":"Barriers, such as the lack of confidence in the robustness of disease signatures based on gene expression measurements, still hinder progress toward personalized medicine. It is therefore important that once derived, a signature is verified via an unbiased process. The IMPROVER initiative was set up to establish an impartial view of methods and results for the classification of patients, based on molecular profiles of disease-relevant or surrogate tissues. Here, the focus is on the Lung Cancer Signature Challenge, in which participants have been asked to classify lung tumor gene expression profiles into 4 classes: adenocarcinoma (AC) and squamous cell carcinoma (SCC), each at either stage 1 or 2. The method reported here was the best performing method in the 4-way classification. The original method is presented as well as an algorithmic approach to replace the empirical (non-computational) steps used in the challenge. In the discussion, the difficulty in classifying stages of tumors as compared with the relatively good classification of subtypes is examined. Hypotheses are made concerning possible reasons for erroneous classification of some of the samples, in view of additional information on the test samples that was not made available to challenge participants.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"268 - 277"},"PeriodicalIF":0.0,"publicationDate":"2013-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25983","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical-TGDR","authors":"S. Tian, M. Suárez-Fariñas","doi":"10.4161/sysb.25979","DOIUrl":"https://doi.org/10.4161/sysb.25979","url":null,"abstract":"Regularization methods that simultaneously select a small set of the most relevant features and build a classifier using the selected features have gained much attention recently in problems of classification of “omics” data. In many multi-class classification problems, which are of practical importance, the classes are naturally endowed with a hierarchical structure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structure to specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presented by the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from non-small cell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysis show that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially more parsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering the naturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlights different mechanisms of “invasion” in the two subtypes. This work suggests that incorporating known biological information into classification algorithms, such as data hierarchies, can improve the discriminative performance and biological interpretation of this classifier.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"278 - 287"},"PeriodicalIF":0.0,"publicationDate":"2013-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25979","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kernel-based method for feature selection and disease diagnosis using transcriptomics data","authors":"Ji-Hoon Cho, Alan Lin, Kai Wang","doi":"10.4161/sysb.25978","DOIUrl":"https://doi.org/10.4161/sysb.25978","url":null,"abstract":"Global transcriptome profiling is the foundation of systems biology and has been extensively used in biomarker discovery. Tools have been developed to extract meaningful biological information and useful gene features from transcriptomics data. However, there is no commonly accepted method for such purposes. The first IMPROVER (industrial methodology for process verification of research) challenge was launched to assess and verify classification methods using transcriptomics data from clinical samples. We established a computational approach that combined a kernel Fisher discriminant classifier and a feature selection scheme, which used scaled alignment selection and recursive feature elimination methods. A simple and reliable batch effect correction approach was also used. With this approach, a set of informative genes, i.e., biomarker candidates, could be identified for disease diagnosis and classification. We applied this approach to the sbv IMPROVER Challenge and achieved the highest rank in the psoriasis sub-challenge. Here, we describe our methodology and results for the sub-challenge.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"254 - 260"},"PeriodicalIF":0.0,"publicationDate":"2013-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25978","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data","authors":"Cheng Zhao, A. Deshwar, Q. Morris","doi":"10.4161/sysb.26131","DOIUrl":"https://doi.org/10.4161/sysb.26131","url":null,"abstract":"As part of the first Industrial Methodology for Process Verification in Research Challenge, the aim of the MS Diagnostic sub-challenge was to identify a robust diagnostic signature for relapsing-remitting multiple sclerosis from gene expression data. In this regard, we built a classifier that discriminates samples into two phenotype groups, either RRMS or controls, using the transcriptome of peripheral blood mononuclear cells. For our classifier, we used logistic regression with elastic net regression as implemented in the glmnet package in R. We selected the values of the regularization hyper-parameters using cross-validation performance on the provided training data, number of non-zero parameters in our model, and based on the distribution of output values when the input vector for the test data were used with our classifier. We analyzed our classifier performance with two different strategies for feature extraction, using either only genes or including additional constructed features from gene pathways data. The two different strategies produced little differences in performance when comparing the 10-fold cross-validation of the training data and prediction on the test data. Our final submission for the sub-challenge used only genes as features, and identified a diagnostic signature consisting of 58 genes, that was ranked second out of a total of 39 submissions.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"247 - 253"},"PeriodicalIF":0.0,"publicationDate":"2013-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26131","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting COPD status with a random generalized linear model","authors":"Lin Song, S. Horvath","doi":"10.4161/sysb.25981","DOIUrl":"https://doi.org/10.4161/sysb.25981","url":null,"abstract":"Sample classification, especially disease status prediction, is an important area of investigation for gene expression studies. Many machine learning methods have been developed to tackle this problem. To evaluate different prediction methods, the IMPROVER Challenge made several data sets available. Here we focus on one sub-challenge: chronic obstructive pulmonary disease (COPD). We outlined critical preprocessing steps to make training and test data comparable. We compared our recently introduced random generalized linear model (RGLM) predictor with Leo Breiman’s random forest (RF) predictor on the COPD data set. We discussed potential reasons for the superior performance of the RGLM predictor in this sub-challenge. Interestingly, we found that although several genes were highly predictive of COPD status, none were necessary to achieve accurate prediction when demographic features smoking status and age were used. In conclusion, RGLM achieved superior predictive accuracy for predicting COPD status with smoking status and age as mandatory features. Future cohort studies could evaluate whether the resulting predictor has clinical utility.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"261 - 267"},"PeriodicalIF":0.0,"publicationDate":"2013-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25981","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Rhrissorrakrai, John Rice, S. Boué, M. Talikka, E. Bilal, F. Martin, Pablo Meyer, R. Norel, Yang Xiang, G. Stolovitzky, J. Hoeng, M. Peitsch
{"title":"sbv IMPROVER Diagnostic Signature Challenge","authors":"K. Rhrissorrakrai, John Rice, S. Boué, M. Talikka, E. Bilal, F. Martin, Pablo Meyer, R. Norel, Yang Xiang, G. Stolovitzky, J. Hoeng, M. Peitsch","doi":"10.4161/sysb.26325","DOIUrl":"https://doi.org/10.4161/sysb.26325","url":null,"abstract":"The sbv IMPROVER (systems biology verification—Industrial Methodology for Process Verification in Research) process aims to help companies verify component steps or tasks in larger research workflows for industrial applications. IMPROVER is built on challenges posed to the community that draws on the wisdom of crowds to assess the most suitable methods for a given research task. The Diagnostic Signature Challenge, open to the public from Mar. 5 to Jun. 21, 2012, was the first instantiation of the IMPROVER methodology and evaluated a fundamental biological question, specifically, if there is sufficient information in gene expression data to diagnose diseases. Fifty-four teams used publically available data to develop prediction models in four disease areas: multiple sclerosis, lung cancer, psoriasis, and chronic obstructive pulmonary disease. The predictions were scored against unpublished, blinded data provided by the organizers, and the results, including methods of the top performers, presented at a conference in Boston on Oct. 2–3, 2012. This paper offers an overview of the Diagnostic Signature Challenge and the accompanying symposium, and is the first article in a special issue of Systems Biomedicine, providing focused reviews of the submitted methods and general conclusions from the challenge. Overall, it was observed that optimal method choice and performance appeared largely dependent on endpoint, and results indicate the psoriasis and lung cancer subtypes sub-challenges were more accurately predicted, while the remaining classification tasks were much more challenging. Though no one approach was superior for every sub-challenge, there were methods, like linear discriminant analysis, that were found to perform consistently well in all.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"196 - 207"},"PeriodicalIF":0.0,"publicationDate":"2013-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26325","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Norel, E. Bilal, Nathalie Conrad-Chemineau, Richard Bonneau, A. G. de la Fuente, I. Jurisica, D. Marbach, Pablo Meyer, J. Rice, T. Tuller, G. Stolovitzky
{"title":"sbv IMPROVER Diagnostic Signature Challenge","authors":"R. Norel, E. Bilal, Nathalie Conrad-Chemineau, Richard Bonneau, A. G. de la Fuente, I. Jurisica, D. Marbach, Pablo Meyer, J. Rice, T. Tuller, G. Stolovitzky","doi":"10.4161/sysb.26326","DOIUrl":"https://doi.org/10.4161/sysb.26326","url":null,"abstract":"Evaluating the performance of computational methods to analyze high throughput data are an integral component of model development and critical to progress in computational biology. In collaborative-competitions, model performance evaluation is crucial to determine the best performing submission. Here we present the scoring methodology used to assess 54 submissions to the IMPROVER Diagnostic Signature Challenge. Participants were tasked with classifying patients’ disease phenotype based on gene expression data in four disease areas: Psoriasis, Chronic Obstructive Pulmonary Disease, Lung Cancer, and Multiple Sclerosis. We discuss the criteria underlying the choice of the three scoring metrics we chose to assess the performance of the submitted models. The statistical significance of the difference in performance between individual submissions and classification tasks varied according to these different metrics. Accordingly, we consider an aggregation of these three assessment methods and present the approaches considered for aggregating the ranking and ultimately determining the final overall best performer.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"208 - 216"},"PeriodicalIF":0.0,"publicationDate":"2013-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26326","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"sbv IMPROVER Diagnostic Signature Challenge","authors":"J. Hoeng, G. Stolovitzky, M. Peitsch","doi":"10.4161/sysb.26324","DOIUrl":"https://doi.org/10.4161/sysb.26324","url":null,"abstract":"The task of predicting disease phenotype from gene expression data has been addressed hundreds if not thousands of times in the recent literature. This expanding body of work is not only an indication that the problem is of great importance and general interest, but it also reveals that neither the experimental nor the computational limitations of translating data to disease information have been satisfactorily understood. To contribute to the advancement of the field, promote collaborative thinking and enable a fair and unbiased comparison of methods, IMPROVER revisited the problem of gene-expression to phenotype prediction using a collaborative-competition paradigm. This special issue of Systems Biomedicine reports the results of the sbv IMPROVER Diagnostic Signature Challenge designed to identify best analytic approaches to predict phenotype from gene expression data.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"193 - 195"},"PeriodicalIF":0.0,"publicationDate":"2013-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26324","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methodological approach from the Best Overall Team in the sbv IMPROVER Diagnostic Signature Challenge","authors":"A. Tarca, N. Than, R. Romero","doi":"10.4161/sysb.25980","DOIUrl":"https://doi.org/10.4161/sysb.25980","url":null,"abstract":"The sbv IMPROVER Diagnostic Signature Challenge used crowdsourcing to identify the best methods to classify clinical samples using transcriptomics data. Participating teams used public microarray data sets to develop prediction models in four disease areas, and then made predictions on blinded test data generated by the organizers. Here we describe the approach of the team for the Perinatology Research Branch (Team PRB; AL Tarca, R Romero), that was awarded the best performing entrant prize out of 54 entrants. The key elements of our approach included: (1) selection of training data sets by trial and error; (2) removal of batch effects by pre-processing the test and training data together; (3) the use of statistical significance and magnitude of change to select biomarkers; and (4) optimization of the number of biomarkers via the cross-validated performance of a simple linear discriminant analysis (LDA) model. Not only were our resulting models ranked consistently high, but they also generated parsimonious signatures of as low as two genes, unlike most of the other top-ranked teams that used hundreds of genes for prediction.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"217 - 227"},"PeriodicalIF":0.0,"publicationDate":"2013-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70654950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}