The Dangerous Art of Text Mining: A Methodology for Digital History by Jo Guldi (review)

IF 0.7 3区哲学 Q2 HISTORY & PHILOSOPHY OF SCIENCE

Technology and Culture Pub Date : 2024-07-19 DOI:10.1353/tech.2024.a933123

Melvin Wevers

{"title":"The Dangerous Art of Text Mining: A Methodology for Digital History by Jo Guldi (review)","authors":"Melvin Wevers","doi":"10.1353/tech.2024.a933123","DOIUrl":null,"url":null,"abstract":" Reviewed by: <ul> <li> The Dangerous Art of Text Mining: A Methodology for Digital History by Jo Guldi </li> <li> Melvin Wevers (bio) </li> </ul> The Dangerous Art of Text Mining: A Methodology for Digital History By Jo Guldi. Cambridge: Cambridge University Press, 2023. Pp. 465. In The Dangerous Art of Text Mining, historian Jo Guldi explores the application of text mining in historical research. Text mining, a method for quantitatively analyzing digitized text, is utilized by Guldi to examine British parliamentary records. The approach is portrayed as a dual-edged sword, embodying both art and hazard. Guldi characterizes text mining as an art that demands specialized expertise and flexible methodologies, aligning with historians’ heuristic and hermeneutic techniques. At the same time, she warns of its dangers, such as the potential for algorithms to foster overgeneralizations, amplify biases in data, and yield conclusions that overlook historical complexities and the nuanced interpretations of past actors. Despite the book’s ostensibly alarming title, Guldi’s critique of text mining is constructive, underscoring its capacity to enhance historical research if used with discernment, thereby avoiding the pitfalls of technological solutionism that mar some big data and early digital humanities projects. The book is structured into three main sections. The first, “Towards a Smarter Data Science,” outlines Guldi’s methodological approach, advocating for a seamless integration of data science with historical research’s nuanced source criticism. The book adopts a somewhat antagonistic stance toward data science, portraying it as naive regarding data and algorithmic bias. [End Page 1033] Guldi posits that historians are uniquely positioned to prevent the uncritical application of algorithms to historical data. While data science studies often overlook such biases (as pointed out by R. Benjamin, Race after Technology, 2019; C. O’Neil, Weapons of Math Destruction, 2016), similar issues are evident in digital humanities and digital history. Think, for example, of the frequent use in digital history of topic modeling algorithms as exploratory tools without inspecting the degree of robustness of the models. Historians could also benefit from work in areas such as explainable AI, where initiatives are being advanced to augment the transparency of AI models, ensuring that users not only trust but also comprehend the underlying mechanisms and rationale of the algorithms employed in their research (for example, Molnar, Interpretable Machine Learning, 2022). Concurrently, information retrieval scholars have developed metrics to evaluate search strategy efficacy. Such metrics deserve more attention in Guldi’s book. Guldi presents the strategy of critical search, which relies on the critical use of multiple algorithms to guide inquiries into large collections of historical materials. “Critical” here refers to the awareness of possible biases introduced by the data and the algorithms at every step of the process. Her language effectively resonates with historians, potentially increasing the method’s adoption. However, a more integrated approach, combining data science methods and validation strategies, would have facilitated a more cohesive amalgamation of the two disciplines. The book adeptly discusses why prediction is a contentious and difficult concept for many historians. Given the contingency of history and the sparsity of data, making predictions is notoriously difficult. Nonetheless, by narrowing down modeling as mostly a predictive tool, Guldi overlooks how modeling could be beneficial for historians. A broader exploration of modeling’s other applications, such as guiding explanations and understanding data biases, would have been beneficial (for example, J. Epstein, Why Model?, 2008). The breadth in examples might have been balanced with a broader overview of methodological interventions, encompassing machine learning, sampling, and time series analysis. The focus on text mining is not necessarily a flaw, but many of the techniques are repeated throughout the book, making it somewhat repetitive. Generally, the book suffers from editing issues, including errors in footnotes, inconsistencies, and repetition of content. The second part, the book’s strongest, presents case studies addressing temporal experience themes like memory, periodization, and event. Guldi adeptly uses historical theory as a conduit between text mining methods, data, and historical inquiry. She demonstrates how theories from scholars like Reinhardt Koselleck, William Sewell, and Astrid Erll can translate historical data analysis goals into specific methodological applications. As such, the book effectively argues for a stronger integration of theory and method in advancing (digital) history. [End Page... ","PeriodicalId":49446,"journal":{"name":"Technology and Culture","volume":"63 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology and Culture","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1353/tech.2024.a933123","RegionNum":3,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HISTORY & PHILOSOPHY OF SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Reviewed by:

The Dangerous Art of Text Mining: A Methodology for Digital History by Jo Guldi
Melvin Wevers (bio)

The Dangerous Art of Text Mining: A Methodology for Digital History
By Jo Guldi. Cambridge: Cambridge University Press, 2023. Pp. 465.

In The Dangerous Art of Text Mining, historian Jo Guldi explores the application of text mining in historical research. Text mining, a method for quantitatively analyzing digitized text, is utilized by Guldi to examine British parliamentary records. The approach is portrayed as a dual-edged sword, embodying both art and hazard. Guldi characterizes text mining as an art that demands specialized expertise and flexible methodologies, aligning with historians’ heuristic and hermeneutic techniques. At the same time, she warns of its dangers, such as the potential for algorithms to foster overgeneralizations, amplify biases in data, and yield conclusions that overlook historical complexities and the nuanced interpretations of past actors. Despite the book’s ostensibly alarming title, Guldi’s critique of text mining is constructive, underscoring its capacity to enhance historical research if used with discernment, thereby avoiding the pitfalls of technological solutionism that mar some big data and early digital humanities projects.

The book is structured into three main sections. The first, “Towards a Smarter Data Science,” outlines Guldi’s methodological approach, advocating for a seamless integration of data science with historical research’s nuanced source criticism. The book adopts a somewhat antagonistic stance toward data science, portraying it as naive regarding data and algorithmic bias. [End Page 1033] Guldi posits that historians are uniquely positioned to prevent the uncritical application of algorithms to historical data. While data science studies often overlook such biases (as pointed out by R. Benjamin, Race after Technology, 2019; C. O’Neil, Weapons of Math Destruction, 2016), similar issues are evident in digital humanities and digital history. Think, for example, of the frequent use in digital history of topic modeling algorithms as exploratory tools without inspecting the degree of robustness of the models. Historians could also benefit from work in areas such as explainable AI, where initiatives are being advanced to augment the transparency of AI models, ensuring that users not only trust but also comprehend the underlying mechanisms and rationale of the algorithms employed in their research (for example, Molnar, Interpretable Machine Learning, 2022). Concurrently, information retrieval scholars have developed metrics to evaluate search strategy efficacy. Such metrics deserve more attention in Guldi’s book.

Guldi presents the strategy of critical search, which relies on the critical use of multiple algorithms to guide inquiries into large collections of historical materials. “Critical” here refers to the awareness of possible biases introduced by the data and the algorithms at every step of the process. Her language effectively resonates with historians, potentially increasing the method’s adoption. However, a more integrated approach, combining data science methods and validation strategies, would have facilitated a more cohesive amalgamation of the two disciplines. The book adeptly discusses why prediction is a contentious and difficult concept for many historians. Given the contingency of history and the sparsity of data, making predictions is notoriously difficult. Nonetheless, by narrowing down modeling as mostly a predictive tool, Guldi overlooks how modeling could be beneficial for historians. A broader exploration of modeling’s other applications, such as guiding explanations and understanding data biases, would have been beneficial (for example, J. Epstein, Why Model?, 2008). The breadth in examples might have been balanced with a broader overview of methodological interventions, encompassing machine learning, sampling, and time series analysis. The focus on text mining is not necessarily a flaw, but many of the techniques are repeated throughout the book, making it somewhat repetitive. Generally, the book suffers from editing issues, including errors in footnotes, inconsistencies, and repetition of content.

The second part, the book’s strongest, presents case studies addressing temporal experience themes like memory, periodization, and event. Guldi adeptly uses historical theory as a conduit between text mining methods, data, and historical inquiry. She demonstrates how theories from scholars like Reinhardt Koselleck, William Sewell, and Astrid Erll can translate historical data analysis goals into specific methodological applications. As such, the book effectively argues for a stronger integration of theory and method in advancing (digital) history. [End Page...

查看原文本刊更多论文

文本挖掘的危险艺术：数字历史的方法论》，Jo Guldi 著（评论）

评论者：文本挖掘的危险艺术：文本挖掘的危险艺术：数字历史的方法论》作者：Jo Guldi Melvin Wevers (bio) 《文本挖掘的危险艺术：数字历史的方法论》作者：Jo Guldi Melvin Wevers (bio)Jo Guldi 著。剑桥：剑桥大学出版社，2023 年。页码465.在《文本挖掘的危险艺术》一书中，历史学家乔-古尔迪探讨了文本挖掘在历史研究中的应用。文本挖掘是一种对数字化文本进行定量分析的方法，古尔迪利用这种方法研究了英国议会记录。这种方法被描绘成一把双刃剑，既是艺术也是危险。古尔迪将文本挖掘描述为一门艺术，需要专业的知识和灵活的方法，与历史学家的启发式和诠释式技术相一致。同时，她也对其危险性提出警告，比如算法可能会助长过度概括、放大数据中的偏差，以及得出忽略历史复杂性和过去参与者细微解读的结论。尽管该书的标题表面上令人担忧，但古尔迪对文本挖掘的批评是建设性的，强调了文本挖掘在审慎使用的情况下加强历史研究的能力，从而避免了技术解决方案主义的陷阱，而这正是一些大数据和早期数字人文项目的弊端。本书分为三个主要部分。第一部分 "迈向更智能的数据科学 "概述了古尔迪的方法论，主张将数据科学与历史研究的细微源头批判无缝结合。该书对数据科学采取了某种对立的立场，认为数据科学在数据和算法偏见方面过于天真。[古尔迪认为，历史学家在防止将算法不加批判地应用于历史数据方面具有得天独厚的优势。虽然数据科学研究经常忽略此类偏见（如 R. Benjamin，Race after Technology，2019；C. O'Neil，Weapons of Math Destruction，2016），但类似的问题在数字人文和数字历史中也很明显。例如，数字历史中经常使用主题建模算法作为探索工具，却不检查模型的稳健程度。历史学家还可以从可解释人工智能等领域的工作中获益，这些领域正在推进提高人工智能模型透明度的举措，以确保用户不仅信任而且理解其研究中所使用算法的基本机制和原理（例如，Molnar, Interpretable Machine Learning, 2022）。与此同时，信息检索学者也制定了评估搜索策略有效性的指标。在古尔迪的书中，这些指标值得更多关注。古尔迪提出了批判性搜索策略，该策略依赖于批判性地使用多种算法来指导对大量历史资料的查询。这里的 "批判 "指的是意识到数据和算法在每一步过程中可能带来的偏差。她的语言有效地引起了历史学家的共鸣，可能会增加该方法的采用率。不过，如果能将数据科学方法与验证策略结合起来，就能促进这两个学科更紧密地融合。该书巧妙地讨论了为什么预测对许多历史学家来说是一个有争议和困难的概念。鉴于历史的偶然性和数据的稀缺性，预测是出了名的困难。然而，古尔迪将建模狭隘地视为一种主要的预测工具，从而忽略了建模对历史学家的益处。如果能更广泛地探讨建模的其他应用，如指导解释和理解数据偏差，将大有裨益（例如，J. Epstein, Why Model?）如果能对方法干预进行更广泛的概述，包括机器学习、抽样和时间序列分析，或许能平衡例子的广度。专注于文本挖掘并不一定是本书的缺陷，但书中的许多技术都是重复的，使本书有些重复。总体而言，该书存在编辑方面的问题，包括脚注错误、前后矛盾和内容重复。第二部分是本书的精华所在，介绍了一些案例研究，涉及时间经验主题，如记忆、时期化和事件。古尔迪善于利用历史理论作为文本挖掘方法、数据和历史探究之间的通道。她展示了莱因哈特-科塞勒克（Reinhardt Koselleck）、威廉-西维尔（William Sewell）和阿斯特丽德-埃尔（Astrid Erll）等学者的理论如何将历史数据分析目标转化为具体的方法应用。因此，该书有效地论证了在推进（数字）历史学的过程中，理论与方法的更紧密结合。 [尾页...

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Technology and Culture 社会科学-科学史与科学哲学

CiteScore

0.60

自引率

14.30%

发文量

225

审稿时长

>12 weeks

期刊介绍： Technology and Culture, the preeminent journal of the history of technology, draws on scholarship in diverse disciplines to publish insightful pieces intended for general readers as well as specialists. Subscribers include scientists, engineers, anthropologists, sociologists, economists, museum curators, archivists, scholars, librarians, educators, historians, and many others. In addition to scholarly essays, each issue features 30-40 book reviews and reviews of new museum exhibitions. To illuminate important debates and draw attention to specific topics, the journal occasionally publishes thematic issues. Technology and Culture is the official journal of the Society for the History of Technology (SHOT).