{"title":"分层TGDR","authors":"S. Tian, M. Suárez-Fariñas","doi":"10.4161/sysb.25979","DOIUrl":null,"url":null,"abstract":"Regularization methods that simultaneously select a small set of the most relevant features and build a classifier using the selected features have gained much attention recently in problems of classification of “omics” data. In many multi-class classification problems, which are of practical importance, the classes are naturally endowed with a hierarchical structure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structure to specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presented by the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from non-small cell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysis show that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially more parsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering the naturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlights different mechanisms of “invasion” in the two subtypes. This work suggests that incorporating known biological information into classification algorithms, such as data hierarchies, can improve the discriminative performance and biological interpretation of this classifier.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"278 - 287"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25979","citationCount":"9","resultStr":"{\"title\":\"Hierarchical-TGDR\",\"authors\":\"S. Tian, M. Suárez-Fariñas\",\"doi\":\"10.4161/sysb.25979\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regularization methods that simultaneously select a small set of the most relevant features and build a classifier using the selected features have gained much attention recently in problems of classification of “omics” data. In many multi-class classification problems, which are of practical importance, the classes are naturally endowed with a hierarchical structure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structure to specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presented by the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from non-small cell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysis show that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially more parsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering the naturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlights different mechanisms of “invasion” in the two subtypes. This work suggests that incorporating known biological information into classification algorithms, such as data hierarchies, can improve the discriminative performance and biological interpretation of this classifier.\",\"PeriodicalId\":90057,\"journal\":{\"name\":\"Systems biomedicine (Austin, Tex.)\",\"volume\":\"1 1\",\"pages\":\"278 - 287\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.4161/sysb.25979\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systems biomedicine (Austin, Tex.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4161/sysb.25979\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems biomedicine (Austin, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4161/sysb.25979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Regularization methods that simultaneously select a small set of the most relevant features and build a classifier using the selected features have gained much attention recently in problems of classification of “omics” data. In many multi-class classification problems, which are of practical importance, the classes are naturally endowed with a hierarchical structure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structure to specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presented by the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from non-small cell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysis show that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially more parsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering the naturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlights different mechanisms of “invasion” in the two subtypes. This work suggests that incorporating known biological information into classification algorithms, such as data hierarchies, can improve the discriminative performance and biological interpretation of this classifier.