A Scalable Hybrid Research Paper Recommender System for Microsoft Academic

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313700

Anshul Kanakia, Zhihong Shen, Darrin Eide, Kuansan Wang

{"title":"A Scalable Hybrid Research Paper Recommender System for Microsoft Academic","authors":"Anshul Kanakia, Zhihong Shen, Darrin Eide, Kuansan Wang","doi":"10.1145/3308558.3313700","DOIUrl":null,"url":null,"abstract":"We present the design and methodology for the large scale hybrid paper recommender system used by Microsoft Academic. The system provides recommendations for approximately 160 million English research papers and patents. Our approach handles incomplete citation information while also alleviating the cold-start problem that often affects other recommender systems. We use the Microsoft Academic Graph (MAG), titles, and available abstracts of research papers to build a recommendation list for all documents, thereby combining co-citation and content based approaches. Tuning system parameters also allows for blending and prioritization of each approach which, in turn, allows us to balance paper novelty versus authority in recommendation results. We evaluate the generated recommendations via a user study of 40 participants, with over 2400 recommendation pairs graded and discuss the quality of the results using P@10 and nDCG scores. We see that there is a strong correlation between participant scores and the similarity rankings produced by our system but that additional focus needs to be put towards improving recommender precision, particularly for content based recommendations. The results of the user survey and associated analysis scripts are made available via GitHub and the recommendations produced by our system are available as part of the MAG on Azure to facilitate further research and light up novel research paper recommendation applications.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

We present the design and methodology for the large scale hybrid paper recommender system used by Microsoft Academic. The system provides recommendations for approximately 160 million English research papers and patents. Our approach handles incomplete citation information while also alleviating the cold-start problem that often affects other recommender systems. We use the Microsoft Academic Graph (MAG), titles, and available abstracts of research papers to build a recommendation list for all documents, thereby combining co-citation and content based approaches. Tuning system parameters also allows for blending and prioritization of each approach which, in turn, allows us to balance paper novelty versus authority in recommendation results. We evaluate the generated recommendations via a user study of 40 participants, with over 2400 recommendation pairs graded and discuss the quality of the results using P@10 and nDCG scores. We see that there is a strong correlation between participant scores and the similarity rankings produced by our system but that additional focus needs to be put towards improving recommender precision, particularly for content based recommendations. The results of the user survey and associated analysis scripts are made available via GitHub and the recommendations produced by our system are available as part of the MAG on Azure to facilitate further research and light up novel research paper recommendation applications.

查看原文本刊更多论文

一个可扩展的混合研究论文推荐系统的微软学术

本文介绍了微软学术应用的大型混合式论文推荐系统的设计和方法。该系统为大约1.6亿篇英文研究论文和专利提供推荐。我们的方法处理了不完整的引文信息，同时也缓解了经常影响其他推荐系统的冷启动问题。我们使用Microsoft Academic Graph (MAG)、标题和研究论文的可用摘要来构建所有文档的推荐列表，从而结合了共同引用和基于内容的方法。调整系统参数还允许混合和优先考虑每种方法，这反过来又允许我们在推荐结果中平衡论文的新颖性和权威性。我们通过对40名参与者的用户研究来评估生成的推荐，对2400多对推荐进行评分，并使用P@10和nDCG分数讨论结果的质量。我们发现，参与者的分数与我们的系统产生的相似度排名之间存在很强的相关性，但我们需要进一步关注如何提高推荐的精度，尤其是基于内容的推荐。用户调查的结果和相关的分析脚本可以通过GitHub提供，我们的系统产生的建议可以作为Azure MAG的一部分，以促进进一步的研究和点亮新的研究论文推荐应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量