{"title":"The Dangerous Art of Text Mining: A Methodology for Digital History by Jo Guldi (review)","authors":"Melvin Wevers","doi":"10.1353/tech.2024.a933123","DOIUrl":null,"url":null,"abstract":"<p> <span>Reviewed by:</span> <ul> <li><!-- html_title --> <em>The Dangerous Art of Text Mining: A Methodology for Digital History</em> by Jo Guldi <!-- /html_title --></li> <li> Melvin Wevers (bio) </li> </ul> <em>The Dangerous Art of Text Mining: A Methodology for Digital History</em><br/> By Jo Guldi. Cambridge: Cambridge University Press, 2023. Pp. 465. <p>In <em>The Dangerous Art of Text Mining</em>, historian Jo Guldi explores the application of text mining in historical research. Text mining, a method for quantitatively analyzing digitized text, is utilized by Guldi to examine British parliamentary records. The approach is portrayed as a dual-edged sword, embodying both art and hazard. Guldi characterizes text mining as an art that demands specialized expertise and flexible methodologies, aligning with historians’ heuristic and hermeneutic techniques. At the same time, she warns of its dangers, such as the potential for algorithms to foster overgeneralizations, amplify biases in data, and yield conclusions that overlook historical complexities and the nuanced interpretations of past actors. Despite the book’s ostensibly alarming title, Guldi’s critique of text mining is constructive, underscoring its capacity to enhance historical research if used with discernment, thereby avoiding the pitfalls of technological solutionism that mar some big data and early digital humanities projects.</p> <p>The book is structured into three main sections. The first, “Towards a Smarter Data Science,” outlines Guldi’s methodological approach, advocating for a seamless integration of data science with historical research’s nuanced source criticism. The book adopts a somewhat antagonistic stance toward data science, portraying it as naive regarding data and algorithmic bias. <strong>[End Page 1033]</strong> Guldi posits that historians are uniquely positioned to prevent the uncritical application of algorithms to historical data. While data science studies often overlook such biases (as pointed out by R. Benjamin, <em>Race after Technology</em>, 2019; C. O’Neil, <em>Weapons of Math Destruction</em>, 2016), similar issues are evident in digital humanities and digital history. Think, for example, of the frequent use in digital history of topic modeling algorithms as exploratory tools without inspecting the degree of robustness of the models. Historians could also benefit from work in areas such as explainable AI, where initiatives are being advanced to augment the transparency of AI models, ensuring that users not only trust but also comprehend the underlying mechanisms and rationale of the algorithms employed in their research (for example, Molnar, <em>Interpretable Machine Learning</em>, 2022). Concurrently, information retrieval scholars have developed metrics to evaluate search strategy efficacy. Such metrics deserve more attention in Guldi’s book.</p> <p>Guldi presents the strategy of critical search, which relies on the critical use of multiple algorithms to guide inquiries into large collections of historical materials. “Critical” here refers to the awareness of possible biases introduced by the data and the algorithms at every step of the process. Her language effectively resonates with historians, potentially increasing the method’s adoption. However, a more integrated approach, combining data science methods and validation strategies, would have facilitated a more cohesive amalgamation of the two disciplines. The book adeptly discusses why prediction is a contentious and difficult concept for many historians. Given the contingency of history and the sparsity of data, making predictions is notoriously difficult. Nonetheless, by narrowing down modeling as mostly a predictive tool, Guldi overlooks how modeling could be beneficial for historians. A broader exploration of modeling’s other applications, such as guiding explanations and understanding data biases, would have been beneficial (for example, J. Epstein, <em>Why Model?</em>, 2008). The breadth in examples might have been balanced with a broader overview of methodological interventions, encompassing machine learning, sampling, and time series analysis. The focus on text mining is not necessarily a flaw, but many of the techniques are repeated throughout the book, making it somewhat repetitive. Generally, the book suffers from editing issues, including errors in footnotes, inconsistencies, and repetition of content.</p> <p>The second part, the book’s strongest, presents case studies addressing temporal experience themes like memory, periodization, and event. Guldi adeptly uses historical theory as a conduit between text mining methods, data, and historical inquiry. She demonstrates how theories from scholars like Reinhardt Koselleck, William Sewell, and Astrid Erll can translate historical data analysis goals into specific methodological applications. As such, the book effectively argues for a stronger integration of theory and method in advancing (digital) history. <strong> [End Page...</strong></p> </p>","PeriodicalId":49446,"journal":{"name":"Technology and Culture","volume":"63 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology and Culture","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1353/tech.2024.a933123","RegionNum":3,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HISTORY & PHILOSOPHY OF SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Reviewed by:
The Dangerous Art of Text Mining: A Methodology for Digital History by Jo Guldi
Melvin Wevers (bio)
The Dangerous Art of Text Mining: A Methodology for Digital History By Jo Guldi. Cambridge: Cambridge University Press, 2023. Pp. 465.
In The Dangerous Art of Text Mining, historian Jo Guldi explores the application of text mining in historical research. Text mining, a method for quantitatively analyzing digitized text, is utilized by Guldi to examine British parliamentary records. The approach is portrayed as a dual-edged sword, embodying both art and hazard. Guldi characterizes text mining as an art that demands specialized expertise and flexible methodologies, aligning with historians’ heuristic and hermeneutic techniques. At the same time, she warns of its dangers, such as the potential for algorithms to foster overgeneralizations, amplify biases in data, and yield conclusions that overlook historical complexities and the nuanced interpretations of past actors. Despite the book’s ostensibly alarming title, Guldi’s critique of text mining is constructive, underscoring its capacity to enhance historical research if used with discernment, thereby avoiding the pitfalls of technological solutionism that mar some big data and early digital humanities projects.
The book is structured into three main sections. The first, “Towards a Smarter Data Science,” outlines Guldi’s methodological approach, advocating for a seamless integration of data science with historical research’s nuanced source criticism. The book adopts a somewhat antagonistic stance toward data science, portraying it as naive regarding data and algorithmic bias. [End Page 1033] Guldi posits that historians are uniquely positioned to prevent the uncritical application of algorithms to historical data. While data science studies often overlook such biases (as pointed out by R. Benjamin, Race after Technology, 2019; C. O’Neil, Weapons of Math Destruction, 2016), similar issues are evident in digital humanities and digital history. Think, for example, of the frequent use in digital history of topic modeling algorithms as exploratory tools without inspecting the degree of robustness of the models. Historians could also benefit from work in areas such as explainable AI, where initiatives are being advanced to augment the transparency of AI models, ensuring that users not only trust but also comprehend the underlying mechanisms and rationale of the algorithms employed in their research (for example, Molnar, Interpretable Machine Learning, 2022). Concurrently, information retrieval scholars have developed metrics to evaluate search strategy efficacy. Such metrics deserve more attention in Guldi’s book.
Guldi presents the strategy of critical search, which relies on the critical use of multiple algorithms to guide inquiries into large collections of historical materials. “Critical” here refers to the awareness of possible biases introduced by the data and the algorithms at every step of the process. Her language effectively resonates with historians, potentially increasing the method’s adoption. However, a more integrated approach, combining data science methods and validation strategies, would have facilitated a more cohesive amalgamation of the two disciplines. The book adeptly discusses why prediction is a contentious and difficult concept for many historians. Given the contingency of history and the sparsity of data, making predictions is notoriously difficult. Nonetheless, by narrowing down modeling as mostly a predictive tool, Guldi overlooks how modeling could be beneficial for historians. A broader exploration of modeling’s other applications, such as guiding explanations and understanding data biases, would have been beneficial (for example, J. Epstein, Why Model?, 2008). The breadth in examples might have been balanced with a broader overview of methodological interventions, encompassing machine learning, sampling, and time series analysis. The focus on text mining is not necessarily a flaw, but many of the techniques are repeated throughout the book, making it somewhat repetitive. Generally, the book suffers from editing issues, including errors in footnotes, inconsistencies, and repetition of content.
The second part, the book’s strongest, presents case studies addressing temporal experience themes like memory, periodization, and event. Guldi adeptly uses historical theory as a conduit between text mining methods, data, and historical inquiry. She demonstrates how theories from scholars like Reinhardt Koselleck, William Sewell, and Astrid Erll can translate historical data analysis goals into specific methodological applications. As such, the book effectively argues for a stronger integration of theory and method in advancing (digital) history. [End Page...
期刊介绍:
Technology and Culture, the preeminent journal of the history of technology, draws on scholarship in diverse disciplines to publish insightful pieces intended for general readers as well as specialists. Subscribers include scientists, engineers, anthropologists, sociologists, economists, museum curators, archivists, scholars, librarians, educators, historians, and many others. In addition to scholarly essays, each issue features 30-40 book reviews and reviews of new museum exhibitions. To illuminate important debates and draw attention to specific topics, the journal occasionally publishes thematic issues. Technology and Culture is the official journal of the Society for the History of Technology (SHOT).