基于大规模分类语料库关键词索引的词聚类

2009 Fifth International Conference on Information Assurance and Security Pub Date : 2009-08-18 DOI:10.1109/IAS.2009.271

Liu Hua

{"title":"基于大规模分类语料库关键词索引的词聚类","authors":"Liu Hua","doi":"10.1109/IAS.2009.271","DOIUrl":null,"url":null,"abstract":"Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature extraction method is adopted to obtain domanial words automatically, thus reducing noise of similarity calculation","PeriodicalId":240354,"journal":{"name":"2009 Fifth International Conference on Information Assurance and Security","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Words Clustering Based on Keywords Indexing from Large-scale Categorization Corpora\",\"authors\":\"Liu Hua\",\"doi\":\"10.1109/IAS.2009.271\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature extraction method is adopted to obtain domanial words automatically, thus reducing noise of similarity calculation\",\"PeriodicalId\":240354,\"journal\":{\"name\":\"2009 Fifth International Conference on Information Assurance and Security\",\"volume\":\"71 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Fifth International Conference on Information Assurance and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAS.2009.271\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fifth International Conference on Information Assurance and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAS.2009.271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

对于大规模的分类语料库，关键词是自动索引的。选取20余篇文献的索引关键词作为种子词，克服了聚类中选择种子词的主观性;同时，将聚类限制在特定的类别语料库中，采用关键词索引特征提取方法自动获取领域词，从而降低了相似度计算的噪声

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Words Clustering Based on Keywords Indexing from Large-scale Categorization Corpora

Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature extraction method is adopted to obtain domanial words automatically, thus reducing noise of similarity calculation

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 Fifth International Conference on Information Assurance and Security

自引率

0.00%

发文量