Using R to develop a corpus of full-text journal articles

IF 1.7 4区管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Science Pub Date : 2023-07-14 DOI:10.1177/01655515231171362

Billie Anderson, M. Bani-Yaghoub, Vagmi Kantheti, Scott Curtis

引用次数: 0

Abstract

Over the past two decades, databases and the tools to access them in a simple manner have become increasingly available, allowing historical and modern-day topics to be merged and studied. Throughout the recent COVID-19 pandemic, for example, many researchers have reflected on whether any lessons learned from the Spanish flu pandemic of 1918 could have been helpful in the present pandemic. Most studies using text-mining applications rarely use full-text journal articles. This article provides a methodology used to develop a full-text journal article corpus using the R fulltext package. Using the proposed methodology, 2743 full-text journal articles were obtained. The aim of this article is to provide a methodology and supplementary codes for researchers to use the R fulltext package to curate a full-text journal corpus.

查看原文本刊更多论文

使用R开发全文期刊文章的语料库

在过去的二十年里，数据库和以一种简单的方式访问它们的工具变得越来越可用，允许历史和现代主题合并和研究。例如，在最近的COVID-19大流行期间，许多研究人员都在思考，从1918年西班牙流感大流行中吸取的教训是否对当前的大流行有所帮助。大多数使用文本挖掘应用程序的研究很少使用全文期刊文章。本文提供了一种使用R全文包开发全文期刊文章语料库的方法。使用所提出的方法，获得2743篇全文期刊文章。本文的目的是为研究人员提供一种方法和补充代码，以使用R全文包来策划全文期刊语料库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Science 工程技术-计算机：信息系统

CiteScore

6.80

自引率

8.30%

发文量

121

审稿时长

4 months

期刊介绍： The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.