Billie Anderson, M. Bani-Yaghoub, Vagmi Kantheti, Scott Curtis
{"title":"Using R to develop a corpus of full-text journal articles","authors":"Billie Anderson, M. Bani-Yaghoub, Vagmi Kantheti, Scott Curtis","doi":"10.1177/01655515231171362","DOIUrl":null,"url":null,"abstract":"Over the past two decades, databases and the tools to access them in a simple manner have become increasingly available, allowing historical and modern-day topics to be merged and studied. Throughout the recent COVID-19 pandemic, for example, many researchers have reflected on whether any lessons learned from the Spanish flu pandemic of 1918 could have been helpful in the present pandemic. Most studies using text-mining applications rarely use full-text journal articles. This article provides a methodology used to develop a full-text journal article corpus using the R fulltext package. Using the proposed methodology, 2743 full-text journal articles were obtained. The aim of this article is to provide a methodology and supplementary codes for researchers to use the R fulltext package to curate a full-text journal corpus.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515231171362","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Over the past two decades, databases and the tools to access them in a simple manner have become increasingly available, allowing historical and modern-day topics to be merged and studied. Throughout the recent COVID-19 pandemic, for example, many researchers have reflected on whether any lessons learned from the Spanish flu pandemic of 1918 could have been helpful in the present pandemic. Most studies using text-mining applications rarely use full-text journal articles. This article provides a methodology used to develop a full-text journal article corpus using the R fulltext package. Using the proposed methodology, 2743 full-text journal articles were obtained. The aim of this article is to provide a methodology and supplementary codes for researchers to use the R fulltext package to curate a full-text journal corpus.
期刊介绍:
The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.