Identification of intimate partner violence from free text descriptions in social media.

IF 2.3 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Computational Social Science Pub Date : 2022-11-01 Epub Date: 2022-05-07 DOI:10.1007/s42001-022-00166-8

Phan Trinh Ha, Rhea D'Silva, Ethan Chen, Mehmet Koyutürk, Günnur Karakurt

{"title":"Identification of intimate partner violence from free text descriptions in social media.","authors":"Phan Trinh Ha, Rhea D'Silva, Ethan Chen, Mehmet Koyutürk, Günnur Karakurt","doi":"10.1007/s42001-022-00166-8","DOIUrl":null,"url":null,"abstract":"<p><p>Intimate partner violence (IPV) is a significant public health problem that adversely affects the well-being of victims. IPV is often under-reported and non-physical forms of violence may not be recognized as IPV, even by victims. With the increasing popularity of social media and due to the anonymity provided by some of these platforms, people feel comfortable sharing descriptions of their relationship problems in social media. The content generated in these platforms can be useful in identifying IPV and characterizing the prevalence, causes, consequences, and correlates of IPV in broad populations. However, these descriptions are in the form of free text and no corpus of labeled data is available to perform large-scale computational and statistical analyses. Here, we use data from established questionnaires that are used to collect self-report data on IPV to train machine learning models to predict IPV from free text. Using Universal Sentence Encoder (USE) along with multiple machine learning algorithms (random forest, SVM, logistic regression, Naïve Bayes), we develop DetectIPV, a tool for detecting IPV in free text. Using DetectIPV, we comprehensively characterize the predictability of different types of violence (physical abuse, emotional abuse, sexual abuse) from free text. Our results show that a general model that is trained using examples of all violence types can identify IPV from free text with area under the ROC curve (AUROC) 89%. We also train type-specific models and observe that physical abuse can be identified with greatest accuracy (AUROC 98%), while sexual abuse can be identified with high precision but relatively low recall. While our results indicate that the prediction of emotional abuse is the most challenging, DetectIPV can identify emotional abuse with AUROC above 80%. These results establish DetectIPV as a tool that can be used to reliably detect IPV in the context of various applications, ranging from flagging social media posts to detecting IPV in large text corpuses for research purposes. DetectIPV is available as a web service at https://www.ipvlab.case.edu/ipvdetect/.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"32 1","pages":"1207-1233"},"PeriodicalIF":2.3000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12040337/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Social Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42001-022-00166-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/5/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Intimate partner violence (IPV) is a significant public health problem that adversely affects the well-being of victims. IPV is often under-reported and non-physical forms of violence may not be recognized as IPV, even by victims. With the increasing popularity of social media and due to the anonymity provided by some of these platforms, people feel comfortable sharing descriptions of their relationship problems in social media. The content generated in these platforms can be useful in identifying IPV and characterizing the prevalence, causes, consequences, and correlates of IPV in broad populations. However, these descriptions are in the form of free text and no corpus of labeled data is available to perform large-scale computational and statistical analyses. Here, we use data from established questionnaires that are used to collect self-report data on IPV to train machine learning models to predict IPV from free text. Using Universal Sentence Encoder (USE) along with multiple machine learning algorithms (random forest, SVM, logistic regression, Naïve Bayes), we develop DetectIPV, a tool for detecting IPV in free text. Using DetectIPV, we comprehensively characterize the predictability of different types of violence (physical abuse, emotional abuse, sexual abuse) from free text. Our results show that a general model that is trained using examples of all violence types can identify IPV from free text with area under the ROC curve (AUROC) 89%. We also train type-specific models and observe that physical abuse can be identified with greatest accuracy (AUROC 98%), while sexual abuse can be identified with high precision but relatively low recall. While our results indicate that the prediction of emotional abuse is the most challenging, DetectIPV can identify emotional abuse with AUROC above 80%. These results establish DetectIPV as a tool that can be used to reliably detect IPV in the context of various applications, ranging from flagging social media posts to detecting IPV in large text corpuses for research purposes. DetectIPV is available as a web service at https://www.ipvlab.case.edu/ipvdetect/.

查看原文本刊更多论文

从社交媒体上的自由文本描述识别亲密伴侣暴力。

亲密伴侣暴力是一个严重的公共卫生问题，对受害者的福祉产生不利影响。IPV的报告往往不足，非身体形式的暴力可能不被认为是IPV，甚至受害者也不承认。随着社交媒体的日益普及，以及一些平台提供的匿名性，人们在社交媒体上分享自己的感情问题感到很舒服。这些平台产生的内容可用于识别IPV，并描述广泛人群中IPV的流行程度、原因、后果和相关关系。然而，这些描述是自由文本的形式，没有标记数据的语料库可用于执行大规模的计算和统计分析。在这里，我们使用来自既定问卷的数据，这些问卷用于收集关于IPV的自我报告数据，以训练机器学习模型来预测来自自由文本的IPV。使用通用句子编码器（USE）以及多种机器学习算法（随机森林，支持向量机，逻辑回归，Naïve贝叶斯），我们开发了DetectIPV，一个检测自由文本中IPV的工具。使用DetectIPV，我们从自由文本中全面描述了不同类型的暴力（身体虐待、情感虐待、性虐待）的可预测性。我们的研究结果表明，使用所有暴力类型的示例训练的一般模型可以从自由文本中识别出ROC曲线下面积（AUROC）为89%的IPV。我们还训练了特定类型的模型，并观察到身体虐待的识别准确率最高（AUROC为98%），而性虐待的识别准确率很高，但召回率相对较低。虽然我们的研究结果表明，情绪虐待的预测是最具挑战性的，但DetectIPV可以识别AUROC超过80%的情绪虐待。这些结果表明，DetectIPV可以作为一种工具，在各种应用环境中可靠地检测IPV，从标记社交媒体帖子到检测用于研究目的的大型文本语料库中的IPV。DetectIPV是一个网络服务，网址是https://www.ipvlab.case.edu/ipvdetect/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computational Social Science SOCIAL SCIENCES, MATHEMATICAL METHODS-

CiteScore

6.20

自引率

6.20%

发文量