{"title":"使用基于维基百科概念的文档表示的短文本分类","authors":"Xiang Wang, R. Chen, Yan Jia, Bin Zhou","doi":"10.1109/ITA.2013.114","DOIUrl":null,"url":null,"abstract":"Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional. In this paper, we represent short text with Wikipedia concepts for classification. Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization. Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation. Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance. Although it's not better than the state-of-the-art classifier (see e.g. Phan et al. WWW '08), our method can be easily implemented with low cost.","PeriodicalId":285687,"journal":{"name":"2013 International Conference on Information Technology and Applications","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Short Text Classification Using Wikipedia Concept Based Document Representation\",\"authors\":\"Xiang Wang, R. Chen, Yan Jia, Bin Zhou\",\"doi\":\"10.1109/ITA.2013.114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional. In this paper, we represent short text with Wikipedia concepts for classification. Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization. Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation. Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance. Although it's not better than the state-of-the-art classifier (see e.g. Phan et al. WWW '08), our method can be easily implemented with low cost.\",\"PeriodicalId\":285687,\"journal\":{\"name\":\"2013 International Conference on Information Technology and Applications\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Information Technology and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITA.2013.114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Information Technology and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITA.2013.114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Short Text Classification Using Wikipedia Concept Based Document Representation
Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional. In this paper, we represent short text with Wikipedia concepts for classification. Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization. Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation. Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance. Although it's not better than the state-of-the-art classifier (see e.g. Phan et al. WWW '08), our method can be easily implemented with low cost.