{"title":"TextRank Algorithm by Exploiting Wikipedia for Short Text Keywords Extraction","authors":"Wengen Li, Jiabao Zhao","doi":"10.1109/ICISCE.2016.151","DOIUrl":null,"url":null,"abstract":"The characteristic of poor information of short text often makes the effect of traditional keywords extraction not as good as expected. In this paper, we propose a graph-based ranking algorithm by exploiting Wikipedia as an external knowledge base for short text keywords extraction. To overcome the shortcoming of poor information of short text, we introduce the Wikipedia to enrich the short text. We regard each entry of Wikipedia as a concept, therefore the semantic information of each word can be represented by the distribution of Wikipedia's concept. And we measure the similarity between words by constructing the concept vector. Finally we construct keywords matrix and use TextRank for keywords extraction. The comparative experiments with traditional TextRank and baseline algorithm show that our method gets better precision, recall and F-measure value. It is shown that TextRank by exploiting Wikipedia is more suitable for short text keywords extraction.","PeriodicalId":6882,"journal":{"name":"2016 3rd International Conference on Information Science and Control Engineering (ICISCE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 3rd International Conference on Information Science and Control Engineering (ICISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISCE.2016.151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38
Abstract
The characteristic of poor information of short text often makes the effect of traditional keywords extraction not as good as expected. In this paper, we propose a graph-based ranking algorithm by exploiting Wikipedia as an external knowledge base for short text keywords extraction. To overcome the shortcoming of poor information of short text, we introduce the Wikipedia to enrich the short text. We regard each entry of Wikipedia as a concept, therefore the semantic information of each word can be represented by the distribution of Wikipedia's concept. And we measure the similarity between words by constructing the concept vector. Finally we construct keywords matrix and use TextRank for keywords extraction. The comparative experiments with traditional TextRank and baseline algorithm show that our method gets better precision, recall and F-measure value. It is shown that TextRank by exploiting Wikipedia is more suitable for short text keywords extraction.