Keynote Abstract: Machine Learning in Conflict Studies: Reflections on Ethics, Collaboration, and Ongoing Challenges

Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021) Pub Date : 1900-01-01 DOI:10.18653/v1/2021.case-1.3

Kristine Eck

{"title":"Keynote Abstract: Machine Learning in Conflict Studies: Reflections on Ethics, Collaboration, and Ongoing Challenges","authors":"Kristine Eck","doi":"10.18653/v1/2021.case-1.3","DOIUrl":null,"url":null,"abstract":"Advances in machine learning are nothing short of revolutionary in their potential to analyze massive amounts of data and in doing so, create new knowledge bases. But there is a responsibility in wielding the power to analyze these data since the public attributes a high degree of confidence to results which are based on big datasets. In this keynote, I will first address our ethical imperative as scholars to “get it right.” This imperative relates not only to model precision but also to the quality of the underlying data, and to whether the models inadvertently reproduce or obscure political biases in the source material. In considering the ethical imperative to get it right, it is also important to define what is “right”: what is considered an acceptable threshold for classification success needs to be understood in light of the project’s objectives. I then reflect on the different topics and data which are sourced in this field. Much of the existing research has focused on identifying conflict events (e.g. battles), but scholars are also increasingly turning to ML approaches to address other facets of the conflict environment. Conflict event extraction has long been a challenge for the natural language processing (NLP) community because it requires sophisticated methods for defining event ontologies, creating language resources, and developing algorithmic approaches. NLP machine-learning tools are ill-adapted to the complex, often messy, and diverse data generated during conflicts. Relative to other types of NLP text corpora, conflicts tend to generate less textual data, and texts are generated non-systematically. Conflict-related texts are often lexically idiosyncratic and tend to be written differently across actors, periods, and conflicts. Event definition and adjudication present tough challenges in the context of conflict corpora. Topics which rely on other types of data may be better-suited to NLP and machine learning methods. For example, Twitter and other social media data lend themselves well to studying hate speech, public opinion, social polarization, or discursive aspects of conflictual environments. Likewise, government-produced policy documents have typically been analyzed with historical, qualitative methods but their standardized formats and quantity suggest that ML methods can provide new traction. ML approaches may also allow scholars to exploit local sources and multi-language sources to a greater degree than has been possible. Many challenges remain, and these are best addressed in collaborative projects which build on interdisciplinary expertise. Classification projects need to be anchored in the theoretical interests of scholars of political violence if the data they produce are to be put to analytical use. There are few ontologies for classification that adequately reflect conflict researchers’ interests, which highlights the need for conceptual as well as technical development.","PeriodicalId":330699,"journal":{"name":"Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)","volume":"362 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.case-1.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Advances in machine learning are nothing short of revolutionary in their potential to analyze massive amounts of data and in doing so, create new knowledge bases. But there is a responsibility in wielding the power to analyze these data since the public attributes a high degree of confidence to results which are based on big datasets. In this keynote, I will first address our ethical imperative as scholars to “get it right.” This imperative relates not only to model precision but also to the quality of the underlying data, and to whether the models inadvertently reproduce or obscure political biases in the source material. In considering the ethical imperative to get it right, it is also important to define what is “right”: what is considered an acceptable threshold for classification success needs to be understood in light of the project’s objectives. I then reflect on the different topics and data which are sourced in this field. Much of the existing research has focused on identifying conflict events (e.g. battles), but scholars are also increasingly turning to ML approaches to address other facets of the conflict environment. Conflict event extraction has long been a challenge for the natural language processing (NLP) community because it requires sophisticated methods for defining event ontologies, creating language resources, and developing algorithmic approaches. NLP machine-learning tools are ill-adapted to the complex, often messy, and diverse data generated during conflicts. Relative to other types of NLP text corpora, conflicts tend to generate less textual data, and texts are generated non-systematically. Conflict-related texts are often lexically idiosyncratic and tend to be written differently across actors, periods, and conflicts. Event definition and adjudication present tough challenges in the context of conflict corpora. Topics which rely on other types of data may be better-suited to NLP and machine learning methods. For example, Twitter and other social media data lend themselves well to studying hate speech, public opinion, social polarization, or discursive aspects of conflictual environments. Likewise, government-produced policy documents have typically been analyzed with historical, qualitative methods but their standardized formats and quantity suggest that ML methods can provide new traction. ML approaches may also allow scholars to exploit local sources and multi-language sources to a greater degree than has been possible. Many challenges remain, and these are best addressed in collaborative projects which build on interdisciplinary expertise. Classification projects need to be anchored in the theoretical interests of scholars of political violence if the data they produce are to be put to analytical use. There are few ontologies for classification that adequately reflect conflict researchers’ interests, which highlights the need for conceptual as well as technical development.

查看原文本刊更多论文

主题摘要:冲突研究中的机器学习:对伦理、协作和持续挑战的反思

机器学习的进步在分析大量数据并由此创造新的知识库方面具有革命性的潜力。但是，由于公众对基于大数据集的结果具有高度的信心，因此行使分析这些数据的权力是有责任的。在这次主题演讲中，我将首先阐述我们作为学者“把事情做好”的道德责任。这不仅关系到模型的精度，也关系到底层数据的质量，以及模型是否会无意中再现或掩盖源材料中的政治偏见。在考虑正确的道德要求时，定义什么是“正确的”也很重要:需要根据项目的目标来理解分类成功的可接受阈值。然后，我对这个领域的不同主题和数据进行了反思。现有的许多研究都集中在识别冲突事件(例如战斗)上，但学者们也越来越多地转向ML方法来解决冲突环境的其他方面。冲突事件提取长期以来一直是自然语言处理(NLP)社区面临的挑战，因为它需要复杂的方法来定义事件本体、创建语言资源和开发算法方法。NLP机器学习工具无法适应冲突中产生的复杂、混乱和多样化的数据。相对于其他类型的NLP文本语料库，冲突往往产生较少的文本数据，并且文本的生成是非系统的。与冲突相关的文本通常在词汇上是特殊的，并且往往在演员、时期和冲突中被不同地书写。在冲突语料库的背景下，事件的定义和判定面临着严峻的挑战。依赖于其他类型数据的主题可能更适合NLP和机器学习方法。例如，Twitter和其他社交媒体数据非常适合研究仇恨言论、公众舆论、社会两极分化或冲突环境的话语方面。同样，政府制定的政策文件通常使用历史定性方法进行分析，但其标准化格式和数量表明ML方法可以提供新的牵引力。机器学习方法也可能允许学者在更大程度上利用本地资源和多语言资源。许多挑战仍然存在，这些挑战最好通过建立在跨学科专业知识基础上的合作项目来解决。如果分类项目产生的数据要用于分析，那么分类项目需要扎根于研究政治暴力的学者的理论兴趣。很少有分类本体能充分反映冲突研究者的兴趣，这突出了概念和技术发展的需要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

自引率

0.00%

发文量