Mining Art History: Bulk Converting Nonstandard PDFs to Text to Determine the Frequency of Citations and Key Terms in Humanities Articles

A. Wasielewski, A. Dahlgren
{"title":"Mining Art History: Bulk Converting Nonstandard PDFs to Text to Determine the Frequency of Citations and Key Terms in Humanities Articles","authors":"A. Wasielewski, A. Dahlgren","doi":"10.16993/BBK.L","DOIUrl":null,"url":null,"abstract":"Text mining in art history scholarship can tell us about the discipline itself, as well as artistic concerns at any given moment. The aim of this study is to develop and test a strategy for text mining from PDFs of journal articles that have nonstandard formatting and/or use notes rather than full bibliographies for references. While articles in the natural and social sciences typically adhere to standard formats, art history journals employ a variety of formatting styles that make bulk capture of citation and other textual data from the articles challenging. This study outlines a method by which researchers can extract data from journals articles, using a sample set from art history. Once extracted, the data from PDFs can be used to compare frequently used terms across samples and determine which scholars are most cited in either bibliographies or the main body text of articles. If the structure and layout of individual journals are carefully considered and the data is properly cleaned, a clear picture of the disciplinary influences and dependencies of the scholarship through citations and key terms can be obtained.","PeriodicalId":332163,"journal":{"name":"Digital Human Sciences: New Objects – New Approaches","volume":"160 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Human Sciences: New Objects – New Approaches","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16993/BBK.L","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text mining in art history scholarship can tell us about the discipline itself, as well as artistic concerns at any given moment. The aim of this study is to develop and test a strategy for text mining from PDFs of journal articles that have nonstandard formatting and/or use notes rather than full bibliographies for references. While articles in the natural and social sciences typically adhere to standard formats, art history journals employ a variety of formatting styles that make bulk capture of citation and other textual data from the articles challenging. This study outlines a method by which researchers can extract data from journals articles, using a sample set from art history. Once extracted, the data from PDFs can be used to compare frequently used terms across samples and determine which scholars are most cited in either bibliographies or the main body text of articles. If the structure and layout of individual journals are carefully considered and the data is properly cleaned, a clear picture of the disciplinary influences and dependencies of the scholarship through citations and key terms can be obtained.
挖掘艺术史:将非标准pdf文件批量转换为文本以确定人文学科文章中引文和关键术语的频率
艺术史研究中的文本挖掘可以告诉我们学科本身,以及任何特定时刻的艺术关注点。本研究的目的是开发和测试一种策略,用于从非标准格式和/或使用注释而不是完整参考书目的期刊文章的pdf中挖掘文本。虽然自然科学和社会科学的文章通常遵循标准格式,但艺术史期刊采用各种格式风格,这使得从文章中大量捕获引文和其他文本数据具有挑战性。本研究概述了一种方法,通过该方法,研究人员可以从期刊文章中提取数据,使用艺术史的样本集。一旦从pdf文件中提取数据,就可以用来比较样本中常用的术语,并确定哪些学者在参考书目或文章的主体文本中被引用最多。如果仔细考虑单个期刊的结构和布局,并对数据进行适当的清理,就可以通过引用和关键术语清楚地了解该奖学金的学科影响和依赖性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信