Jinkyung Katie Park, Pinxuan Alina Yu, Vignesh Krishnan, Huaye Li, Linda A Reddy, Vivek K Singh
{"title":"Designing Psychologically Grounded Artificial Intelligence for Supporting Bystander-Based Cyberaggression Intervention: Mixed Methods Exploratory Study.","authors":"Jinkyung Katie Park, Pinxuan Alina Yu, Vignesh Krishnan, Huaye Li, Linda A Reddy, Vivek K Singh","doi":"10.2196/84391","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cyberaggression poses a growing threat to mental health, contributing to increased distress, reduced self-esteem, and other adverse psychosocial outcomes. Although bystander intervention can mitigate the escalation and impact of cyberaggression, individuals often lack the confidence, strategies, or language to respond effectively in these high-stakes online interactions. Advances in generative artificial intelligence (AI) present a novel opportunity to facilitate digital behavior change by assisting bystanders with contextually appropriate, theory-informed intervention messages that promote safer online environments and support mental well-being.</p><p><strong>Objective: </strong>This mixed methods design study aimed to explore the feasibility of using generative AI to support bystander intervention in cyberaggression on social media. Specifically, we examined whether AI can generate effective responses aligned with established intervention strategies and how these responses are perceived in terms of their potential to de-escalate online harm and foster behavior change.</p><p><strong>Methods: </strong>We collected 1000 real-world cyberaggression examples from public social media datasets and generated bystander intervention responses using 3 distinct prompt strategies: a generic policy reminder, a baseline GPT prompt, and a theory-driven GPT prompt (AllyGPT). To evaluate the responses, we conducted computational linguistic analyses to assess their psycholinguistic features and carried out a mixed methods evaluation. Three trained coders rated each message on favorability, conversational impact, and potential to change behavior and later participated in semistructured interviews to reflect on their evaluation process and perceptions of intervention effectiveness.</p><p><strong>Results: </strong>Linguistic analyses revealed that baseline GPT responses exhibited more emotionally positive and authentic language compared to AllyGPT responses, which showed a more analytical and assertive tone. Policy reminder messages were linguistically rigid and lacked emotional nuance. Human evaluation results showed that AllyGPT responses received the highest effectiveness ratings for low-incivil cyberaggression cases in 2 dimensions (favorability and changing behavior), and baseline GPT works better for mid and high levels for all effectiveness dimensions. For medium- and high-incivility aggressions, baseline GPT responses received the highest ratings across all 3 dimensions of effectiveness (favorability, discussion-shifting potential, and likelihood of changing bullying behavior), followed by AllyGPT, with policy reminders rated lowest. Qualitative feedback further emphasized that baseline GPT responses were perceived as natural and inclusive, while AllyGPT responses, although grounded in psychological theory, were sometimes viewed as overly direct. Policy reminders were considered clear but lacked persuasive impact.</p><p><strong>Conclusions: </strong>Our work showed that designing effective AI-generated bystander interventions requires a deep sensitivity to platform culture, social context, and user expectations. By combining psychological theory with adaptive, conversational design and ongoing feedback loops, future systems can better support bystanders, delivering interventions that are not only contextually appropriate but also socially resonant and behaviorally impactful. As such, this work serves as a foundation for scalable, human-centered AI systems that promote safer online spaces and users' mental well-being.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"10 ","pages":"e84391"},"PeriodicalIF":2.0000,"publicationDate":"2026-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13075537/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/84391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Cyberaggression poses a growing threat to mental health, contributing to increased distress, reduced self-esteem, and other adverse psychosocial outcomes. Although bystander intervention can mitigate the escalation and impact of cyberaggression, individuals often lack the confidence, strategies, or language to respond effectively in these high-stakes online interactions. Advances in generative artificial intelligence (AI) present a novel opportunity to facilitate digital behavior change by assisting bystanders with contextually appropriate, theory-informed intervention messages that promote safer online environments and support mental well-being.
Objective: This mixed methods design study aimed to explore the feasibility of using generative AI to support bystander intervention in cyberaggression on social media. Specifically, we examined whether AI can generate effective responses aligned with established intervention strategies and how these responses are perceived in terms of their potential to de-escalate online harm and foster behavior change.
Methods: We collected 1000 real-world cyberaggression examples from public social media datasets and generated bystander intervention responses using 3 distinct prompt strategies: a generic policy reminder, a baseline GPT prompt, and a theory-driven GPT prompt (AllyGPT). To evaluate the responses, we conducted computational linguistic analyses to assess their psycholinguistic features and carried out a mixed methods evaluation. Three trained coders rated each message on favorability, conversational impact, and potential to change behavior and later participated in semistructured interviews to reflect on their evaluation process and perceptions of intervention effectiveness.
Results: Linguistic analyses revealed that baseline GPT responses exhibited more emotionally positive and authentic language compared to AllyGPT responses, which showed a more analytical and assertive tone. Policy reminder messages were linguistically rigid and lacked emotional nuance. Human evaluation results showed that AllyGPT responses received the highest effectiveness ratings for low-incivil cyberaggression cases in 2 dimensions (favorability and changing behavior), and baseline GPT works better for mid and high levels for all effectiveness dimensions. For medium- and high-incivility aggressions, baseline GPT responses received the highest ratings across all 3 dimensions of effectiveness (favorability, discussion-shifting potential, and likelihood of changing bullying behavior), followed by AllyGPT, with policy reminders rated lowest. Qualitative feedback further emphasized that baseline GPT responses were perceived as natural and inclusive, while AllyGPT responses, although grounded in psychological theory, were sometimes viewed as overly direct. Policy reminders were considered clear but lacked persuasive impact.
Conclusions: Our work showed that designing effective AI-generated bystander interventions requires a deep sensitivity to platform culture, social context, and user expectations. By combining psychological theory with adaptive, conversational design and ongoing feedback loops, future systems can better support bystanders, delivering interventions that are not only contextually appropriate but also socially resonant and behaviorally impactful. As such, this work serves as a foundation for scalable, human-centered AI systems that promote safer online spaces and users' mental well-being.