Semantic Interest Modeling and Content-Based Scientific Publication Recommendation Using Word Embeddings and Sentence Encoders

IF 2.4 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mouadh Guesmi, Mohamed Amine Chatti, Lamees Kadhim, Shoeb Joarder, Qurat Ul Ain
{"title":"Semantic Interest Modeling and Content-Based Scientific Publication Recommendation Using Word Embeddings and Sentence Encoders","authors":"Mouadh Guesmi, Mohamed Amine Chatti, Lamees Kadhim, Shoeb Joarder, Qurat Ul Ain","doi":"10.3390/mti7090091","DOIUrl":null,"url":null,"abstract":"The fast growth of data in the academic field has contributed to making recommendation systems for scientific papers more popular. Content-based filtering (CBF), a pivotal technique in recommender systems (RS), holds particular significance in the realm of scientific publication recommendations. In a content-based scientific publication RS, recommendations are composed by observing the features of users and papers. Content-based recommendation encompasses three primary steps, namely, item representation, user modeling, and recommendation generation. A crucial part of generating recommendations is the user modeling process. Nevertheless, this step is often neglected in existing content-based scientific publication RS. Moreover, most existing approaches do not capture the semantics of user models and papers. To address these limitations, in this paper we present a transparent Recommendation and Interest Modeling Application (RIMA), a content-based scientific publication RS that implicitly derives user interest models from their authored papers. To address the semantic issues, RIMA combines word embedding-based keyphrase extraction techniques with knowledge bases to generate semantically-enriched user interest models, and additionally leverages pretrained transformer sentence encoders to represent user models and papers and compute their similarities. The effectiveness of our approach was assessed through an offline evaluation by conducting extensive experiments on various datasets along with user study (N = 22), demonstrating that (a) combining SIFRank and SqueezeBERT as an embedding-based keyphrase extraction method with DBpedia as a knowledge base improved the quality of the user interest modeling step, and (b) using the msmarco-distilbert-base-tas-b sentence transformer model achieved better results in the recommendation generation step.","PeriodicalId":52297,"journal":{"name":"Multimodal Technologies and Interaction","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimodal Technologies and Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/mti7090091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The fast growth of data in the academic field has contributed to making recommendation systems for scientific papers more popular. Content-based filtering (CBF), a pivotal technique in recommender systems (RS), holds particular significance in the realm of scientific publication recommendations. In a content-based scientific publication RS, recommendations are composed by observing the features of users and papers. Content-based recommendation encompasses three primary steps, namely, item representation, user modeling, and recommendation generation. A crucial part of generating recommendations is the user modeling process. Nevertheless, this step is often neglected in existing content-based scientific publication RS. Moreover, most existing approaches do not capture the semantics of user models and papers. To address these limitations, in this paper we present a transparent Recommendation and Interest Modeling Application (RIMA), a content-based scientific publication RS that implicitly derives user interest models from their authored papers. To address the semantic issues, RIMA combines word embedding-based keyphrase extraction techniques with knowledge bases to generate semantically-enriched user interest models, and additionally leverages pretrained transformer sentence encoders to represent user models and papers and compute their similarities. The effectiveness of our approach was assessed through an offline evaluation by conducting extensive experiments on various datasets along with user study (N = 22), demonstrating that (a) combining SIFRank and SqueezeBERT as an embedding-based keyphrase extraction method with DBpedia as a knowledge base improved the quality of the user interest modeling step, and (b) using the msmarco-distilbert-base-tas-b sentence transformer model achieved better results in the recommendation generation step.
基于词嵌入和句子编码器的语义兴趣建模和基于内容的科学出版物推荐
学术领域数据的快速增长使得科学论文推荐系统更加普及。基于内容的过滤(CBF)是推荐系统(RS)中的一项关键技术,在科学出版物推荐领域具有特殊的意义。在基于内容的科学出版物RS中,通过观察用户和论文的特征来组成推荐。基于内容的推荐包括三个主要步骤,即项目表示、用户建模和推荐生成。生成推荐的一个关键部分是用户建模过程。然而,在现有的基于内容的科学出版物RS中,这一步经常被忽略。此外,大多数现有方法都没有捕获用户模型和论文的语义。为了解决这些限制,在本文中,我们提出了一个透明的推荐和兴趣建模应用程序(RIMA),这是一个基于内容的科学出版物RS,它隐含地从用户撰写的论文中派生用户兴趣模型。为了解决语义问题,RIMA将基于词嵌入的关键短语提取技术与知识库相结合,生成语义丰富的用户兴趣模型,并利用预训练的转换句子编码器来表示用户模型和论文并计算它们的相似性。通过在各种数据集上进行大量实验和用户研究(N = 22)进行离线评估,评估了我们方法的有效性,结果表明(a)结合SIFRank和SqueezeBERT作为基于嵌入的关键词提取方法,以DBpedia为知识库,提高了用户兴趣建模步骤的质量,(b)使用msmarco-distilbert-base- as-b句子转换器模型在推荐生成步骤中取得了更好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Multimodal Technologies and Interaction
Multimodal Technologies and Interaction Computer Science-Computer Science Applications
CiteScore
4.90
自引率
8.00%
发文量
94
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信