Text Clustering Algorithm Based on Lexical Graph

Yun Sha, Guoying Zhang, Huina Jiang
{"title":"Text Clustering Algorithm Based on Lexical Graph","authors":"Yun Sha, Guoying Zhang, Huina Jiang","doi":"10.1109/FSKD.2007.560","DOIUrl":null,"url":null,"abstract":"Text clustering methods can group text into thematic clusters, which is an important topic in many fields, such as search engine. The well-known methods of text clustering, however, do not really address the special problems of text clustering because of the very high dimensionality data and understandability of the cluster description. An algorithm for text clustering based on lexical graph is proposed in this paper, which is a kind of term-based cluster method. The lexical graph is build with nodes representing words and edges representing their concurrent in text. The attribute of each node is text which the word occurs in. A cluster center is defined as node (word) with large degree in this graph, the center attributes (text occurs in) and its neighbors' are partitioned to one cluster whose description is the center node. This approach reduces drastically the dimensionality of the data and improves the synonymy extension ability. An experimental evaluation on Web documents as well as classical text documents on demonstrates that the proposed algorithms obtain clustering of comparable quality significantly more efficiently than K-Means and STC algorithms on the search results data set. Furthermore, this method provides an understandable description of the discovered clusters by their center.","PeriodicalId":201883,"journal":{"name":"Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2007.560","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Text clustering methods can group text into thematic clusters, which is an important topic in many fields, such as search engine. The well-known methods of text clustering, however, do not really address the special problems of text clustering because of the very high dimensionality data and understandability of the cluster description. An algorithm for text clustering based on lexical graph is proposed in this paper, which is a kind of term-based cluster method. The lexical graph is build with nodes representing words and edges representing their concurrent in text. The attribute of each node is text which the word occurs in. A cluster center is defined as node (word) with large degree in this graph, the center attributes (text occurs in) and its neighbors' are partitioned to one cluster whose description is the center node. This approach reduces drastically the dimensionality of the data and improves the synonymy extension ability. An experimental evaluation on Web documents as well as classical text documents on demonstrates that the proposed algorithms obtain clustering of comparable quality significantly more efficiently than K-Means and STC algorithms on the search results data set. Furthermore, this method provides an understandable description of the discovered clusters by their center.
基于词汇图的文本聚类算法
文本聚类方法可以将文本分组成主题聚类,这在搜索引擎等许多领域都是一个重要的研究课题。然而,众所周知的文本聚类方法并没有真正解决文本聚类的特殊问题,因为数据的维数非常高,而且聚类描述的可理解性很高。本文提出了一种基于词汇图的文本聚类算法,这是一种基于术语的聚类方法。用节点表示单词,边表示它们在文本中的并发性来构建词汇图。每个节点的属性是单词出现的文本。聚类中心定义为图中度较大的节点(词),中心属性(文本发生在)及其相邻属性被划分到一个描述为中心节点的聚类。这种方法大大降低了数据的维数,提高了同义词扩展能力。对Web文档和经典文本文档的实验评估表明,该算法在搜索结果数据集上获得的聚类质量明显高于K-Means和STC算法。此外,该方法还提供了可理解的星团中心描述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信