{"title":"PubTag: Generating Research Tag-Clouds with Keyphrase Extraction and Learning-to-Rank","authors":"Paula Rios, A. Hogan","doi":"10.1109/WI.2018.00-12","DOIUrl":null,"url":null,"abstract":"We investigate automated methods to generate tag-clouds for Computer Science researchers based on keyphrase extraction methods and learning-to-rank models. Given as input the identifier of an author in a bibliographical database (currently DBLP), the method extracts links to the PDFs containing the full-text of the paper. Keyphrase extraction methods are then applied to extract multi-term tags from the text. In order to select the most important tags for the researcher, we propose a set of features that serve as input for a variety of learning-to-rank models. Evaluation is conducted with respect to 12 Computer Science professors, who score a selection of keyphrases extracted from their papers indicating their relevance as a description of research topics. These scores are used to train and compare various learning-to-rank models for reordering the most important keyphrases, which in turn are used to generate final tag clouds for the professors. We further validate the proposed approaches by asking professors to evaluate the final tag-clouds.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2018.00-12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We investigate automated methods to generate tag-clouds for Computer Science researchers based on keyphrase extraction methods and learning-to-rank models. Given as input the identifier of an author in a bibliographical database (currently DBLP), the method extracts links to the PDFs containing the full-text of the paper. Keyphrase extraction methods are then applied to extract multi-term tags from the text. In order to select the most important tags for the researcher, we propose a set of features that serve as input for a variety of learning-to-rank models. Evaluation is conducted with respect to 12 Computer Science professors, who score a selection of keyphrases extracted from their papers indicating their relevance as a description of research topics. These scores are used to train and compare various learning-to-rank models for reordering the most important keyphrases, which in turn are used to generate final tag clouds for the professors. We further validate the proposed approaches by asking professors to evaluate the final tag-clouds.