Identifying Substance Use and High-Risk Sexual Behavior Among Sexual and Gender Minority Youth by Using Mobile Phone Data: Development and Validation Study.

IF 1.1
Mehrab Beikzadeh, Ian W Holloway, Kimmo Kärkkäinen, Chenglin Hong, Cory Cascalheira, Elizabeth S C Wu, Callisto Boka, Alexandra C Avendaño, Elizabeth A Yonko, Majid Sarrafzadeh
{"title":"Identifying Substance Use and High-Risk Sexual Behavior Among Sexual and Gender Minority Youth by Using Mobile Phone Data: Development and Validation Study.","authors":"Mehrab Beikzadeh, Ian W Holloway, Kimmo Kärkkäinen, Chenglin Hong, Cory Cascalheira, Elizabeth S C Wu, Callisto Boka, Alexandra C Avendaño, Elizabeth A Yonko, Majid Sarrafzadeh","doi":"10.2196/68013","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Sexual and gender minority (SGM) individuals are at heightened risk for substance use and sexually transmitted infections than their non-SGM peers. Collecting mobile phone usage data passively may open new opportunities for personalizing interventions, as behavioral risks could be identified without user input.</p><p><strong>Objective: </strong>This study aimed to determine (1) whether passively sensed mobile phone data can be used to identify substance use and sexual risk behaviors for sexually transmitted infection (STI) and HIV transmission among young SGM who have sex with men, (2) which outcomes can be predicted with a high level of accuracy, and (3) which passive data sources are most predictive of these outcomes.</p><p><strong>Methods: </strong>We developed a mobile phone app to collect participants' messaging, location, and app use data and trained a machine learning model to predict risk behaviors for STI and HIV transmission. We used Scikit-learn to train logistic regression and gradient boosting classification models with simple linear model specification to predict participants' substance use and sexual behaviors (ie, condomless anal sex, number of sexual partners, and methamphetamine use), which were validated using self-report questionnaires. F1-scores were used to quantify prediction accuracy of the model using different data sources (and combinations of these sources) for prediction. Differences between text, location, app use, and Linguistic Inquiry and Word Count (LIWC) domains by outcome were investigated using independent t tests where associations were considered significant at P<.05.</p><p><strong>Results: </strong>Among participants (n=82) who identified as SGM, were sexually active, and reported recent substance use, our model was highly predictive of methamphetamine use and having ≥6 sexual partners (F1-scores as high as 0.83 and 0.69, respectively). The model was less predictive of condomless anal sex (highest F1-score 0.38). Overall, text-based features were found to be most predictive, but app use and location data improved predictive accuracy, particularly for detecting ≥6 sexual partners. Methamphetamine use was significantly associated with dating app use (P=.01) and use of sex-related words (P=.002). Having ≥6 sex partners was associated with dating app use (0.02), use of sex-related words (P=.001), and traveling a further distance from home (P=.03), on average, compared to participants with fewer sex partners. Methamphetamine users were more likely to use social (P=.002) and affect words (P=.003) and less likely to use drive-related words (P=.02). People having 6 or more partners were more likely to use social, affect words, and cognitive process-related words (P=.003 and .004 respectively).</p><p><strong>Conclusions: </strong>Our results show that passively collected mobile phone data may be useful in detecting sexual risk behaviors. Expanding data collection may improve the results further, as certain behaviors, such as injection drug use, were quite rare in the study sample. These models may be used to personalize STI and HIV prevention as well as substance use harm reduction interventions.</p>","PeriodicalId":74345,"journal":{"name":"Online journal of public health informatics","volume":"17 ","pages":"e68013"},"PeriodicalIF":1.1000,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360732/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online journal of public health informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/68013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Sexual and gender minority (SGM) individuals are at heightened risk for substance use and sexually transmitted infections than their non-SGM peers. Collecting mobile phone usage data passively may open new opportunities for personalizing interventions, as behavioral risks could be identified without user input.

Objective: This study aimed to determine (1) whether passively sensed mobile phone data can be used to identify substance use and sexual risk behaviors for sexually transmitted infection (STI) and HIV transmission among young SGM who have sex with men, (2) which outcomes can be predicted with a high level of accuracy, and (3) which passive data sources are most predictive of these outcomes.

Methods: We developed a mobile phone app to collect participants' messaging, location, and app use data and trained a machine learning model to predict risk behaviors for STI and HIV transmission. We used Scikit-learn to train logistic regression and gradient boosting classification models with simple linear model specification to predict participants' substance use and sexual behaviors (ie, condomless anal sex, number of sexual partners, and methamphetamine use), which were validated using self-report questionnaires. F1-scores were used to quantify prediction accuracy of the model using different data sources (and combinations of these sources) for prediction. Differences between text, location, app use, and Linguistic Inquiry and Word Count (LIWC) domains by outcome were investigated using independent t tests where associations were considered significant at P<.05.

Results: Among participants (n=82) who identified as SGM, were sexually active, and reported recent substance use, our model was highly predictive of methamphetamine use and having ≥6 sexual partners (F1-scores as high as 0.83 and 0.69, respectively). The model was less predictive of condomless anal sex (highest F1-score 0.38). Overall, text-based features were found to be most predictive, but app use and location data improved predictive accuracy, particularly for detecting ≥6 sexual partners. Methamphetamine use was significantly associated with dating app use (P=.01) and use of sex-related words (P=.002). Having ≥6 sex partners was associated with dating app use (0.02), use of sex-related words (P=.001), and traveling a further distance from home (P=.03), on average, compared to participants with fewer sex partners. Methamphetamine users were more likely to use social (P=.002) and affect words (P=.003) and less likely to use drive-related words (P=.02). People having 6 or more partners were more likely to use social, affect words, and cognitive process-related words (P=.003 and .004 respectively).

Conclusions: Our results show that passively collected mobile phone data may be useful in detecting sexual risk behaviors. Expanding data collection may improve the results further, as certain behaviors, such as injection drug use, were quite rare in the study sample. These models may be used to personalize STI and HIV prevention as well as substance use harm reduction interventions.

Abstract Image

Abstract Image

Abstract Image

利用手机数据识别性少数和性别少数青少年的物质使用和高危性行为:开发和验证研究。
背景:性少数和性别少数(SGM)个体比非SGM同龄人有更高的物质使用和性传播感染风险。被动地收集移动电话使用数据可能为个性化干预提供新的机会,因为无需用户输入即可识别行为风险。目的:本研究旨在确定(1)被动感知的手机数据是否可以用于识别性传播感染(STI)和艾滋病毒传播的物质使用和性危险行为,(2)哪些结果可以高水平预测,以及(3)哪些被动数据源最能预测这些结果。方法:我们开发了一个手机应用程序来收集参与者的信息、位置和应用程序使用数据,并训练了一个机器学习模型来预测性传播感染和艾滋病毒传播的风险行为。我们使用Scikit-learn训练逻辑回归和梯度增强分类模型,并使用简单的线性模型规范来预测参与者的物质使用和性行为(即无套肛交、性伴侣数量和甲基苯丙胺使用),并使用自我报告问卷进行验证。f1分数用于量化使用不同数据源(以及这些数据源的组合)进行预测的模型的预测准确性。使用独立t检验对文本、位置、应用程序使用、语言调查和字数统计(LIWC)领域之间的差异进行了调查,结果认为相关性显著:在被确定为SGM、性活跃并报告最近使用药物的参与者(n=82)中,我们的模型高度预测甲基苯丙胺使用和拥有≥6个性伴侣(f1得分分别高达0.83和0.69)。该模型对无套肛交的预测较差(最高f1得分为0.38)。总体而言,基于文本的特征被发现是最具预测性的,但应用程序的使用和位置数据提高了预测的准确性,特别是在检测≥6个性伴侣时。甲基苯丙胺的使用与约会应用程序的使用(P= 0.01)和性相关词汇的使用(P= 0.002)显著相关。与性伴侣较少的参与者相比,拥有≥6个性伴侣的参与者与约会应用程序的使用(0.02)、性相关词汇的使用(P=.001)以及离家更远的距离(P=.03)相关。甲基苯丙胺使用者更倾向于使用社交词汇(P= 0.002)和影响词汇(P= 0.003),而较少使用与驾驶相关的词汇(P= 0.02)。有6个或更多伴侣的人更有可能使用社交词汇、影响词汇和认知过程相关词汇(P=。分别为0.003和0.004)。结论:我们的研究结果表明,被动收集的手机数据可能有助于发现性危险行为。扩大数据收集可能会进一步改善结果,因为某些行为,如注射吸毒,在研究样本中相当罕见。这些模型可用于个性化性传播感染和艾滋病毒预防以及减少药物使用危害的干预措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信