{"title":"Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages","authors":"Xinghua Li, Xindong Wu, Xuegang Hu, Fei Xie, Zhaozhong Jiang","doi":"10.1109/ICDMW.2008.122","DOIUrl":null,"url":null,"abstract":"This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the correlation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news Web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single Web page without corpus. Experiments on randomly selected Web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2008.122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the correlation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news Web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single Web page without corpus. Experiments on randomly selected Web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.