Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI:10.1109/ICDMW.2008.122

Xinghua Li, Xindong Wu, Xuegang Hu, Fei Xie, Zhaozhong Jiang

引用次数: 11

Abstract

This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the correlation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news Web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single Web page without corpus. Experiments on randomly selected Web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.

查看原文本刊更多论文

基于词汇链和词共现的中文新闻网页关键词抽取

本文提出了一种基于词汇链和词共现的中文新闻网页关键词提取算法，该算法结合了频率特征、衔接特征和关联特征。词汇链是文本中语义相关词的外部性能一致性，是文本一部分语义内容的表示。词共现分布是自然语言处理中广泛应用的一个重要统计模型，它反映了词之间的相关性。本文将词汇链和词共现相结合，采用KELCC算法对中文新闻网页进行关键词提取。该算法不是特定于领域的，可以应用于没有语料库的单个Web页面。在随机选择的网页上进行了实验，以证明我们提出的算法提取的关键字的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量