Exploring the Proportion of Content Represented by the Metadata of Research Articles

2020 3rd International Conference on Advancements in Computational Sciences (ICACS) Pub Date : 2020-02-01 DOI:10.1109/ICACS47775.2020.9055955

Shahzad Nazir, M. Asif, Shahbaz Ahmad

{"title":"Exploring the Proportion of Content Represented by the Metadata of Research Articles","authors":"Shahzad Nazir, M. Asif, Shahbaz Ahmad","doi":"10.1109/ICACS47775.2020.9055955","DOIUrl":null,"url":null,"abstract":"In this era, to find out relevant research articles is considered an important task to track the state-of-the-art-work, and it is termed as research paper recommender system. Considering the massive increase in research corpora, the research community has turned its focus towards finding the most relevant research papers. Researchers have adopted different techniques that are bibliographic information based, content-based, and collaborative filtering based. The most common approach for the research paper recommender system is content-based. According to a survey, 55% of research paper recommender systems use a content-based approach. On the other hand, due to the unavailability of the full text of research papers, researchers started utilizing the Meta-data. But it is still unclear that what proportion of full content can be represented by the Meta-data. This research explored the significant portion of the full content contained by the Metadata of research articles. We applied two different techniques; in the first technique, we implemented the TF-IDF over Metadata and full content and considered the intersection of key terms. Secondly, similarity scores of Meta-data and full content were calculated by applying cosine similarity. This approach was assessed on a dataset of 271 research articles that were automatically downloaded from CiteseerX. The results revealed that the Meta-data of research articles could effectively represent the 47% proportion.","PeriodicalId":268675,"journal":{"name":"2020 3rd International Conference on Advancements in Computational Sciences (ICACS)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Advancements in Computational Sciences (ICACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACS47775.2020.9055955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In this era, to find out relevant research articles is considered an important task to track the state-of-the-art-work, and it is termed as research paper recommender system. Considering the massive increase in research corpora, the research community has turned its focus towards finding the most relevant research papers. Researchers have adopted different techniques that are bibliographic information based, content-based, and collaborative filtering based. The most common approach for the research paper recommender system is content-based. According to a survey, 55% of research paper recommender systems use a content-based approach. On the other hand, due to the unavailability of the full text of research papers, researchers started utilizing the Meta-data. But it is still unclear that what proportion of full content can be represented by the Meta-data. This research explored the significant portion of the full content contained by the Metadata of research articles. We applied two different techniques; in the first technique, we implemented the TF-IDF over Metadata and full content and considered the intersection of key terms. Secondly, similarity scores of Meta-data and full content were calculated by applying cosine similarity. This approach was assessed on a dataset of 271 research articles that were automatically downloaded from CiteseerX. The results revealed that the Meta-data of research articles could effectively represent the 47% proportion.

查看原文本刊更多论文

探索科研论文元数据所代表的内容比例

在这个时代，寻找相关的研究论文被认为是跟踪最新工作的一项重要任务，并被称为研究论文推荐系统。考虑到研究语料库的大量增加，研究界已将重点转向寻找最相关的研究论文。研究人员采用了基于书目信息、基于内容和基于协同过滤的不同技术。研究论文推荐系统最常见的方法是基于内容的。根据一项调查，55%的研究论文推荐系统使用基于内容的方法。另一方面，由于无法获得研究论文的全文，研究人员开始利用元数据。但是元数据能代表多少比例的完整内容还不清楚。本研究探讨了研究文章元数据所包含的完整内容的重要部分。我们采用了两种不同的技术;在第一种技术中，我们在元数据和完整内容上实现TF-IDF，并考虑关键术语的交集。其次，利用余弦相似度计算元数据与完整内容的相似度得分;该方法在从CiteseerX自动下载的271篇研究文章的数据集上进行了评估。结果显示，研究论文的Meta-data可以有效地代表47%的比例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 3rd International Conference on Advancements in Computational Sciences (ICACS)

自引率

0.00%

发文量