{"title":"使用维基百科分类发现文本文档的主题","authors":"Abdullah Bawakid","doi":"10.1109/IHMSC.2015.68","DOIUrl":null,"url":null,"abstract":"This paper describes a new unsupervised approach for identifying the main themes of any text document with the aid of Wikipedia. In contrast to others, the proposed algorithm relies on merely two main aspects of Wikipedia, namely its articles titles and categories structure. The inner content of the articles of Wikipedia are not employed in our algorithm. We describe in this paper how to build a Term-Categories vector that defines how strong a term is associated to a Wikipedia concept. We also explain how this vector is employed when processing a text document to discover its main themes. We report the performance of our method by attempting to predict the most representative categories for a subset of Wikipedia articles.","PeriodicalId":6592,"journal":{"name":"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics","volume":"96 1","pages":"452-455"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Using Wikipedia Categories for Discovering the Themes of Text Documents\",\"authors\":\"Abdullah Bawakid\",\"doi\":\"10.1109/IHMSC.2015.68\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a new unsupervised approach for identifying the main themes of any text document with the aid of Wikipedia. In contrast to others, the proposed algorithm relies on merely two main aspects of Wikipedia, namely its articles titles and categories structure. The inner content of the articles of Wikipedia are not employed in our algorithm. We describe in this paper how to build a Term-Categories vector that defines how strong a term is associated to a Wikipedia concept. We also explain how this vector is employed when processing a text document to discover its main themes. We report the performance of our method by attempting to predict the most representative categories for a subset of Wikipedia articles.\",\"PeriodicalId\":6592,\"journal\":{\"name\":\"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"volume\":\"96 1\",\"pages\":\"452-455\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHMSC.2015.68\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHMSC.2015.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Wikipedia Categories for Discovering the Themes of Text Documents
This paper describes a new unsupervised approach for identifying the main themes of any text document with the aid of Wikipedia. In contrast to others, the proposed algorithm relies on merely two main aspects of Wikipedia, namely its articles titles and categories structure. The inner content of the articles of Wikipedia are not employed in our algorithm. We describe in this paper how to build a Term-Categories vector that defines how strong a term is associated to a Wikipedia concept. We also explain how this vector is employed when processing a text document to discover its main themes. We report the performance of our method by attempting to predict the most representative categories for a subset of Wikipedia articles.