Discovering Search Space Using M-distance Clustering of Semantic Relatedness Based Weighted Network for the Content-based Recommender System

IF 0.6 Q3 INFORMATION SCIENCE & LIBRARY SCIENCE

Journal of Scientometric Research Pub Date : 2023-08-06 DOI:10.5530/jscires.12.2.024

Mayur Makawana, Rupa G. Mehta

{"title":"Discovering Search Space Using M-distance Clustering of Semantic Relatedness Based Weighted Network for the Content-based Recommender System","authors":"Mayur Makawana, Rupa G. Mehta","doi":"10.5530/jscires.12.2.024","DOIUrl":null,"url":null,"abstract":"As part of the research process, relevant documents are identified to keep up with the latest advancements in the domain. Document recommendation systems are used by researchers as a means of accomplishing this goal. Textual content, collaborative filtering, and citation information-based approaches are among the proposed approaches for the recommendation systems. Content-based techniques take advantage of the entire text of papers and produce more promising results, but comparing input document text data to every document in the dataset is not practical for the content-based recommender system. This study looks into the possibility of using bibliographic data to reduce the number of comparisons. The proposed system is based on the assumption that two scientific papers are semantically connected if they are co-cited more frequently than by chance. The likelihood of co-citation, also known as semantic relatedness, can be used to quantify this connection. This work presents a new way to distribute the weight among connected scholarly documents based on a semantic relatedness score. Our proposed solution eliminates a substantial amount of needless text comparisons for the system by gathering scholarly document pairs with high likelihood values and using them as a search area for the content-based recommender system. By spreading the co-citation relationship out to certain distances, the proposed approach can find relevant documents that are not found by traditional co-citation searches. The results reveal that the system is capable of reducing computations by a significant margin and of detecting false positive situations in content comparison using Doc2vec.","PeriodicalId":43282,"journal":{"name":"Journal of Scientometric Research","volume":"7 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2023-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Scientometric Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5530/jscires.12.2.024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As part of the research process, relevant documents are identified to keep up with the latest advancements in the domain. Document recommendation systems are used by researchers as a means of accomplishing this goal. Textual content, collaborative filtering, and citation information-based approaches are among the proposed approaches for the recommendation systems. Content-based techniques take advantage of the entire text of papers and produce more promising results, but comparing input document text data to every document in the dataset is not practical for the content-based recommender system. This study looks into the possibility of using bibliographic data to reduce the number of comparisons. The proposed system is based on the assumption that two scientific papers are semantically connected if they are co-cited more frequently than by chance. The likelihood of co-citation, also known as semantic relatedness, can be used to quantify this connection. This work presents a new way to distribute the weight among connected scholarly documents based on a semantic relatedness score. Our proposed solution eliminates a substantial amount of needless text comparisons for the system by gathering scholarly document pairs with high likelihood values and using them as a search area for the content-based recommender system. By spreading the co-citation relationship out to certain distances, the proposed approach can find relevant documents that are not found by traditional co-citation searches. The results reveal that the system is capable of reducing computations by a significant margin and of detecting false positive situations in content comparison using Doc2vec.

查看原文本刊更多论文

基于语义关联的加权网络m距离聚类发现搜索空间的内容推荐系统

作为研究过程的一部分，相关文件被确定以跟上该领域的最新进展。文献推荐系统被研究人员用作实现这一目标的一种手段。文本内容、协同过滤和基于引文信息的方法是推荐系统中提出的方法。基于内容的技术利用了论文的整个文本并产生了更有希望的结果，但是将输入文档文本数据与数据集中的每个文档进行比较对于基于内容的推荐系统来说是不切实际的。本研究探讨使用书目资料减少比较次数的可能性。提出的系统是基于这样一个假设:如果两篇科学论文被共同引用的频率高于偶然，那么它们在语义上是有联系的。共同引用的可能性，也被称为语义相关性，可以用来量化这种联系。这项工作提出了一种基于语义相关性评分在连接的学术文献之间分配权重的新方法。我们提出的解决方案通过收集具有高似然值的学术文档对并将其用作基于内容的推荐系统的搜索区域，为系统消除了大量不必要的文本比较。通过将共被引关系扩展到一定距离，该方法可以找到传统共被引搜索无法找到的相关文献。结果表明，该系统能够大大减少计算量，并在使用Doc2vec进行内容比较时检测出假阳性情况。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Scientometric Research INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

1.30

自引率

12.50%

发文量