自动识别Twitter亲密伴侣暴力报告的自然语言模型

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2022-09-01 DOI:10.1016/j.array.2022.100217

Mohammed Ali Al-Garadi , Sangmi Kim , Yuting Guo , Elise Warren , Yuan-Chi Yang , Sahithi Lakamana , Abeed Sarker

{"title":"自动识别Twitter亲密伴侣暴力报告的自然语言模型","authors":"Mohammed Ali Al-Garadi , Sangmi Kim , Yuting Guo , Elise Warren , Yuan-Chi Yang , Sahithi Lakamana , Abeed Sarker","doi":"10.1016/j.array.2022.100217","DOIUrl":null,"url":null,"abstract":"<div><p>Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need. However, no artificial intelligence systems for automatic detection currently exists, and we attempted to address this research gap. We collected posts from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total, with the inter-annotator agreement (IAA) of 0.86 (Cohen's kappa) among 1,834 double-annotated tweets. The class distribution in the annotated dataset was highly imbalanced, with only 668 posts (∼11%) labeled as IPV-report. We then developed an effective natural language processing model to identify IPV-reporting tweets automatically. The developed model achieved classification F<sub>1</sub>-scores of 0.76 for the IPV-report class and 0.97 for the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and gender. Our automatic model can be an essential component for a proactive social media-based intervention and support framework, while also aiding population-level surveillance and large-scale cohort studies.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"15 ","pages":"Article 100217"},"PeriodicalIF":2.3000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/48/57/nihms-1882589.PMC10065459.pdf","citationCount":"15","resultStr":"{\"title\":\"Natural language model for automatic identification of Intimate Partner Violence reports from Twitter\",\"authors\":\"Mohammed Ali Al-Garadi , Sangmi Kim , Yuting Guo , Elise Warren , Yuan-Chi Yang , Sahithi Lakamana , Abeed Sarker\",\"doi\":\"10.1016/j.array.2022.100217\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need. However, no artificial intelligence systems for automatic detection currently exists, and we attempted to address this research gap. We collected posts from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total, with the inter-annotator agreement (IAA) of 0.86 (Cohen's kappa) among 1,834 double-annotated tweets. The class distribution in the annotated dataset was highly imbalanced, with only 668 posts (∼11%) labeled as IPV-report. We then developed an effective natural language processing model to identify IPV-reporting tweets automatically. The developed model achieved classification F<sub>1</sub>-scores of 0.76 for the IPV-report class and 0.97 for the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and gender. Our automatic model can be an essential component for a proactive social media-based intervention and support framework, while also aiding population-level surveillance and large-scale cohort studies.</p></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"15 \",\"pages\":\"Article 100217\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/48/57/nihms-1882589.PMC10065459.pdf\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005622000625\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005622000625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 15

摘要

亲密伴侣暴力是一个可预防的公共卫生问题，影响着全世界数百万人。据估计，不论年龄、种族和经济地位如何，大约四分之一的妇女在其生命的某个阶段是或曾经是严重暴力的受害者。受害者经常在社交媒体上报告IPV经历，通过机器学习自动检测此类报告可能有助于改善监测，并有针对性地为有需要的人提供支持和/或干预措施。然而，目前还没有用于自动检测的人工智能系统，我们试图解决这一研究空白。我们使用与ipv6相关的关键字列表从Twitter收集帖子，手动审查检索到的帖子的子集，并准备注释指南，将tweet分类为ipv6 -report或非ipv6 -report。我们一共注释了6348条tweet，在1834条双注释tweet中，注释者间协议(IAA)为0.86 (Cohen’s kappa)。注释数据集中的类分布高度不平衡，只有668篇文章(约11%)标记为ipv6 -report。然后，我们开发了一个有效的自然语言处理模型来自动识别ipv6报告推文。所开发的模型实现了分类f1 - 0.76分的ipv4 -报告类和0.97分的非ipv6 -报告类。我们进行了分类后分析，以确定系统错误的原因，并确保系统在决策过程中没有表现出偏见，特别是在种族和性别方面。我们的自动模型可以成为主动的基于社交媒体的干预和支持框架的重要组成部分，同时也有助于人口水平的监测和大规模队列研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Natural language model for automatic identification of Intimate Partner Violence reports from Twitter

查看原文本刊更多论文

Natural language model for automatic identification of Intimate Partner Violence reports from Twitter

Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need. However, no artificial intelligence systems for automatic detection currently exists, and we attempted to address this research gap. We collected posts from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total, with the inter-annotator agreement (IAA) of 0.86 (Cohen's kappa) among 1,834 double-annotated tweets. The class distribution in the annotated dataset was highly imbalanced, with only 668 posts (∼11%) labeled as IPV-report. We then developed an effective natural language processing model to identify IPV-reporting tweets automatically. The developed model achieved classification F₁-scores of 0.76 for the IPV-report class and 0.97 for the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and gender. Our automatic model can be an essential component for a proactive social media-based intervention and support framework, while also aiding population-level surveillance and large-scale cohort studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Array Computer Science-General Computer Science

CiteScore

4.40

自引率

0.00%

发文量

审稿时长

45 days