基于词嵌入和句子编码器的语义兴趣建模和基于内容的科学出版物推荐

IF 2.4 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Multimodal Technologies and Interaction Pub Date : 2023-09-15 DOI:10.3390/mti7090091

Mouadh Guesmi, Mohamed Amine Chatti, Lamees Kadhim, Shoeb Joarder, Qurat Ul Ain

{"title":"基于词嵌入和句子编码器的语义兴趣建模和基于内容的科学出版物推荐","authors":"Mouadh Guesmi, Mohamed Amine Chatti, Lamees Kadhim, Shoeb Joarder, Qurat Ul Ain","doi":"10.3390/mti7090091","DOIUrl":null,"url":null,"abstract":"The fast growth of data in the academic field has contributed to making recommendation systems for scientific papers more popular. Content-based filtering (CBF), a pivotal technique in recommender systems (RS), holds particular significance in the realm of scientific publication recommendations. In a content-based scientific publication RS, recommendations are composed by observing the features of users and papers. Content-based recommendation encompasses three primary steps, namely, item representation, user modeling, and recommendation generation. A crucial part of generating recommendations is the user modeling process. Nevertheless, this step is often neglected in existing content-based scientific publication RS. Moreover, most existing approaches do not capture the semantics of user models and papers. To address these limitations, in this paper we present a transparent Recommendation and Interest Modeling Application (RIMA), a content-based scientific publication RS that implicitly derives user interest models from their authored papers. To address the semantic issues, RIMA combines word embedding-based keyphrase extraction techniques with knowledge bases to generate semantically-enriched user interest models, and additionally leverages pretrained transformer sentence encoders to represent user models and papers and compute their similarities. The effectiveness of our approach was assessed through an offline evaluation by conducting extensive experiments on various datasets along with user study (N = 22), demonstrating that (a) combining SIFRank and SqueezeBERT as an embedding-based keyphrase extraction method with DBpedia as a knowledge base improved the quality of the user interest modeling step, and (b) using the msmarco-distilbert-base-tas-b sentence transformer model achieved better results in the recommendation generation step.","PeriodicalId":52297,"journal":{"name":"Multimodal Technologies and Interaction","volume":"39 1","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Interest Modeling and Content-Based Scientific Publication Recommendation Using Word Embeddings and Sentence Encoders\",\"authors\":\"Mouadh Guesmi, Mohamed Amine Chatti, Lamees Kadhim, Shoeb Joarder, Qurat Ul Ain\",\"doi\":\"10.3390/mti7090091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fast growth of data in the academic field has contributed to making recommendation systems for scientific papers more popular. Content-based filtering (CBF), a pivotal technique in recommender systems (RS), holds particular significance in the realm of scientific publication recommendations. In a content-based scientific publication RS, recommendations are composed by observing the features of users and papers. Content-based recommendation encompasses three primary steps, namely, item representation, user modeling, and recommendation generation. A crucial part of generating recommendations is the user modeling process. Nevertheless, this step is often neglected in existing content-based scientific publication RS. Moreover, most existing approaches do not capture the semantics of user models and papers. To address these limitations, in this paper we present a transparent Recommendation and Interest Modeling Application (RIMA), a content-based scientific publication RS that implicitly derives user interest models from their authored papers. To address the semantic issues, RIMA combines word embedding-based keyphrase extraction techniques with knowledge bases to generate semantically-enriched user interest models, and additionally leverages pretrained transformer sentence encoders to represent user models and papers and compute their similarities. The effectiveness of our approach was assessed through an offline evaluation by conducting extensive experiments on various datasets along with user study (N = 22), demonstrating that (a) combining SIFRank and SqueezeBERT as an embedding-based keyphrase extraction method with DBpedia as a knowledge base improved the quality of the user interest modeling step, and (b) using the msmarco-distilbert-base-tas-b sentence transformer model achieved better results in the recommendation generation step.\",\"PeriodicalId\":52297,\"journal\":{\"name\":\"Multimodal Technologies and Interaction\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimodal Technologies and Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/mti7090091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimodal Technologies and Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/mti7090091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

学术领域数据的快速增长使得科学论文推荐系统更加普及。基于内容的过滤(CBF)是推荐系统(RS)中的一项关键技术，在科学出版物推荐领域具有特殊的意义。在基于内容的科学出版物RS中，通过观察用户和论文的特征来组成推荐。基于内容的推荐包括三个主要步骤，即项目表示、用户建模和推荐生成。生成推荐的一个关键部分是用户建模过程。然而，在现有的基于内容的科学出版物RS中，这一步经常被忽略。此外，大多数现有方法都没有捕获用户模型和论文的语义。为了解决这些限制，在本文中，我们提出了一个透明的推荐和兴趣建模应用程序(RIMA)，这是一个基于内容的科学出版物RS，它隐含地从用户撰写的论文中派生用户兴趣模型。为了解决语义问题，RIMA将基于词嵌入的关键短语提取技术与知识库相结合，生成语义丰富的用户兴趣模型，并利用预训练的转换句子编码器来表示用户模型和论文并计算它们的相似性。通过在各种数据集上进行大量实验和用户研究(N = 22)进行离线评估，评估了我们方法的有效性，结果表明(a)结合SIFRank和SqueezeBERT作为基于嵌入的关键词提取方法，以DBpedia为知识库，提高了用户兴趣建模步骤的质量，(b)使用msmarco-distilbert-base- as-b句子转换器模型在推荐生成步骤中取得了更好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic Interest Modeling and Content-Based Scientific Publication Recommendation Using Word Embeddings and Sentence Encoders

The fast growth of data in the academic field has contributed to making recommendation systems for scientific papers more popular. Content-based filtering (CBF), a pivotal technique in recommender systems (RS), holds particular significance in the realm of scientific publication recommendations. In a content-based scientific publication RS, recommendations are composed by observing the features of users and papers. Content-based recommendation encompasses three primary steps, namely, item representation, user modeling, and recommendation generation. A crucial part of generating recommendations is the user modeling process. Nevertheless, this step is often neglected in existing content-based scientific publication RS. Moreover, most existing approaches do not capture the semantics of user models and papers. To address these limitations, in this paper we present a transparent Recommendation and Interest Modeling Application (RIMA), a content-based scientific publication RS that implicitly derives user interest models from their authored papers. To address the semantic issues, RIMA combines word embedding-based keyphrase extraction techniques with knowledge bases to generate semantically-enriched user interest models, and additionally leverages pretrained transformer sentence encoders to represent user models and papers and compute their similarities. The effectiveness of our approach was assessed through an offline evaluation by conducting extensive experiments on various datasets along with user study (N = 22), demonstrating that (a) combining SIFRank and SqueezeBERT as an embedding-based keyphrase extraction method with DBpedia as a knowledge base improved the quality of the user interest modeling step, and (b) using the msmarco-distilbert-base-tas-b sentence transformer model achieved better results in the recommendation generation step.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Multimodal Technologies and Interaction Computer Science-Computer Science Applications

CiteScore

4.90

自引率

8.00%

发文量

审稿时长

4 weeks