{"title":"一种用于大规模中文文本分类的新型混合系统","authors":"Zhong Gao, Guanming Lu, Daquan Gu","doi":"10.1109/FCST.2008.29","DOIUrl":null,"url":null,"abstract":"Most of the Chinese text classification systems are all based on the technology of bag of words (BW) which is a valid probability tool for text representation and can provide a better semantic architecture. But the weakness in classification accuracy is still unconquerable. Support vector machine (SVM) has become a popular classification tool and can be applied in the scheme, but the main disadvantages of SVM algorithms are their large memory requirement and computation time to deal with very large datasets. In this paper, we propose a hybrid system based on BW and a novel cascade SVM with feedback that can be splitting the problem into smaller subsets and training a network to assign samples of different subsets. The proposed parallel training algorithm on large-scale classification problems where multiple SVM classifiers are applied speeds up the process of training SVM and increase the classification accuracy.","PeriodicalId":206207,"journal":{"name":"2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Novel Hybrid system for Large-Scale Chinese Text Classification Problem\",\"authors\":\"Zhong Gao, Guanming Lu, Daquan Gu\",\"doi\":\"10.1109/FCST.2008.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the Chinese text classification systems are all based on the technology of bag of words (BW) which is a valid probability tool for text representation and can provide a better semantic architecture. But the weakness in classification accuracy is still unconquerable. Support vector machine (SVM) has become a popular classification tool and can be applied in the scheme, but the main disadvantages of SVM algorithms are their large memory requirement and computation time to deal with very large datasets. In this paper, we propose a hybrid system based on BW and a novel cascade SVM with feedback that can be splitting the problem into smaller subsets and training a network to assign samples of different subsets. The proposed parallel training algorithm on large-scale classification problems where multiple SVM classifiers are applied speeds up the process of training SVM and increase the classification accuracy.\",\"PeriodicalId\":206207,\"journal\":{\"name\":\"2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCST.2008.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCST.2008.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Hybrid system for Large-Scale Chinese Text Classification Problem
Most of the Chinese text classification systems are all based on the technology of bag of words (BW) which is a valid probability tool for text representation and can provide a better semantic architecture. But the weakness in classification accuracy is still unconquerable. Support vector machine (SVM) has become a popular classification tool and can be applied in the scheme, but the main disadvantages of SVM algorithms are their large memory requirement and computation time to deal with very large datasets. In this paper, we propose a hybrid system based on BW and a novel cascade SVM with feedback that can be splitting the problem into smaller subsets and training a network to assign samples of different subsets. The proposed parallel training algorithm on large-scale classification problems where multiple SVM classifiers are applied speeds up the process of training SVM and increase the classification accuracy.