Exploring the Proportion of Content Represented by the Metadata of Research Articles

Shahzad Nazir, M. Asif, Shahbaz Ahmad
{"title":"Exploring the Proportion of Content Represented by the Metadata of Research Articles","authors":"Shahzad Nazir, M. Asif, Shahbaz Ahmad","doi":"10.1109/ICACS47775.2020.9055955","DOIUrl":null,"url":null,"abstract":"In this era, to find out relevant research articles is considered an important task to track the state-of-the-art-work, and it is termed as research paper recommender system. Considering the massive increase in research corpora, the research community has turned its focus towards finding the most relevant research papers. Researchers have adopted different techniques that are bibliographic information based, content-based, and collaborative filtering based. The most common approach for the research paper recommender system is content-based. According to a survey, 55% of research paper recommender systems use a content-based approach. On the other hand, due to the unavailability of the full text of research papers, researchers started utilizing the Meta-data. But it is still unclear that what proportion of full content can be represented by the Meta-data. This research explored the significant portion of the full content contained by the Metadata of research articles. We applied two different techniques; in the first technique, we implemented the TF-IDF over Metadata and full content and considered the intersection of key terms. Secondly, similarity scores of Meta-data and full content were calculated by applying cosine similarity. This approach was assessed on a dataset of 271 research articles that were automatically downloaded from CiteseerX. The results revealed that the Meta-data of research articles could effectively represent the 47% proportion.","PeriodicalId":268675,"journal":{"name":"2020 3rd International Conference on Advancements in Computational Sciences (ICACS)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Advancements in Computational Sciences (ICACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACS47775.2020.9055955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this era, to find out relevant research articles is considered an important task to track the state-of-the-art-work, and it is termed as research paper recommender system. Considering the massive increase in research corpora, the research community has turned its focus towards finding the most relevant research papers. Researchers have adopted different techniques that are bibliographic information based, content-based, and collaborative filtering based. The most common approach for the research paper recommender system is content-based. According to a survey, 55% of research paper recommender systems use a content-based approach. On the other hand, due to the unavailability of the full text of research papers, researchers started utilizing the Meta-data. But it is still unclear that what proportion of full content can be represented by the Meta-data. This research explored the significant portion of the full content contained by the Metadata of research articles. We applied two different techniques; in the first technique, we implemented the TF-IDF over Metadata and full content and considered the intersection of key terms. Secondly, similarity scores of Meta-data and full content were calculated by applying cosine similarity. This approach was assessed on a dataset of 271 research articles that were automatically downloaded from CiteseerX. The results revealed that the Meta-data of research articles could effectively represent the 47% proportion.
探索科研论文元数据所代表的内容比例
在这个时代,寻找相关的研究论文被认为是跟踪最新工作的一项重要任务,并被称为研究论文推荐系统。考虑到研究语料库的大量增加,研究界已将重点转向寻找最相关的研究论文。研究人员采用了基于书目信息、基于内容和基于协同过滤的不同技术。研究论文推荐系统最常见的方法是基于内容的。根据一项调查,55%的研究论文推荐系统使用基于内容的方法。另一方面,由于无法获得研究论文的全文,研究人员开始利用元数据。但是元数据能代表多少比例的完整内容还不清楚。本研究探讨了研究文章元数据所包含的完整内容的重要部分。我们采用了两种不同的技术;在第一种技术中,我们在元数据和完整内容上实现TF-IDF,并考虑关键术语的交集。其次,利用余弦相似度计算元数据与完整内容的相似度得分;该方法在从CiteseerX自动下载的271篇研究文章的数据集上进行了评估。结果显示,研究论文的Meta-data可以有效地代表47%的比例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信