Moving Toward Meaningful Evaluations of Monitoring in e-Mental Health Based on the Case of a Web-Based Grief Service for Older Mourners: Mixed Methods Study.
Lena Brandl, Stephanie Jansen-Kosterink, Jeannette Brodbeck, Sofia Jacinto, Bettina Mooser, Dirk Heylen
{"title":"Moving Toward Meaningful Evaluations of Monitoring in e-Mental Health Based on the Case of a Web-Based Grief Service for Older Mourners: Mixed Methods Study.","authors":"Lena Brandl, Stephanie Jansen-Kosterink, Jeannette Brodbeck, Sofia Jacinto, Bettina Mooser, Dirk Heylen","doi":"10.2196/63262","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) tools hold much promise for mental health care by increasing the scalability and accessibility of care. However, current development and evaluation practices of AI tools limit their meaningfulness for health care contexts and therefore also the practical usefulness of such tools for professionals and clients alike.</p><p><strong>Objective: </strong>The aim of this study is to demonstrate the evaluation of an AI monitoring tool that detects the need for more intensive care in a web-based grief intervention for older mourners who have lost their spouse, with the goal of moving toward meaningful evaluation of AI tools in e-mental health.</p><p><strong>Methods: </strong>We leveraged the insights from three evaluation approaches: (1) the F1-score evaluated the tool's capacity to classify user monitoring parameters as either in need of more intensive support or recommendable to continue using the web-based grief intervention as is; (2) we used linear regression to assess the predictive value of users' monitoring parameters for clinical changes in grief, depression, and loneliness over the course of a 10-week intervention; and (3) we collected qualitative experience data from e-coaches (N=4) who incorporated the monitoring in their weekly email guidance during the 10-week intervention.</p><p><strong>Results: </strong>Based on n=174 binary recommendation decisions, the F1-score of the monitoring tool was 0.91. Due to minimal change in depression and loneliness scores after the 10-week intervention, only 1 linear regression was conducted. The difference score in grief before and after the intervention was included as a dependent variable. Participants' (N=21) mean score on the self-report monitoring and the estimated slope of individually fitted growth curves and its standard error (ie, participants' response pattern to the monitoring questions) were used as predictors. Only the mean monitoring score exhibited predictive value for the observed change in grief (R2=1.19, SE 0.33; t16=3.58, P=.002). The e-coaches appreciated the monitoring tool as an opportunity to confirm their initial impression about intervention participants, personalize their email guidance, and detect when participants' mental health deteriorated during the intervention.</p><p><strong>Conclusions: </strong>The monitoring tool evaluated in this paper identified a need for more intensive support reasonably well in a nonclinical sample of older mourners, had some predictive value for the change in grief symptoms during a 10-week intervention, and was appreciated as an additional source of mental health information by e-coaches who supported mourners during the intervention. Each evaluation approach in this paper came with its own set of limitations, including (1) skewed class distributions in prediction tasks based on real-life health data and (2) choosing meaningful statistical analyses based on clinical trial designs that are not targeted at evaluating AI tools. However, combining multiple evaluation methods facilitates drawing meaningful conclusions about the clinical value of AI monitoring tools for their intended mental health context.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"8 ","pages":"e63262"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/63262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Artificial intelligence (AI) tools hold much promise for mental health care by increasing the scalability and accessibility of care. However, current development and evaluation practices of AI tools limit their meaningfulness for health care contexts and therefore also the practical usefulness of such tools for professionals and clients alike.
Objective: The aim of this study is to demonstrate the evaluation of an AI monitoring tool that detects the need for more intensive care in a web-based grief intervention for older mourners who have lost their spouse, with the goal of moving toward meaningful evaluation of AI tools in e-mental health.
Methods: We leveraged the insights from three evaluation approaches: (1) the F1-score evaluated the tool's capacity to classify user monitoring parameters as either in need of more intensive support or recommendable to continue using the web-based grief intervention as is; (2) we used linear regression to assess the predictive value of users' monitoring parameters for clinical changes in grief, depression, and loneliness over the course of a 10-week intervention; and (3) we collected qualitative experience data from e-coaches (N=4) who incorporated the monitoring in their weekly email guidance during the 10-week intervention.
Results: Based on n=174 binary recommendation decisions, the F1-score of the monitoring tool was 0.91. Due to minimal change in depression and loneliness scores after the 10-week intervention, only 1 linear regression was conducted. The difference score in grief before and after the intervention was included as a dependent variable. Participants' (N=21) mean score on the self-report monitoring and the estimated slope of individually fitted growth curves and its standard error (ie, participants' response pattern to the monitoring questions) were used as predictors. Only the mean monitoring score exhibited predictive value for the observed change in grief (R2=1.19, SE 0.33; t16=3.58, P=.002). The e-coaches appreciated the monitoring tool as an opportunity to confirm their initial impression about intervention participants, personalize their email guidance, and detect when participants' mental health deteriorated during the intervention.
Conclusions: The monitoring tool evaluated in this paper identified a need for more intensive support reasonably well in a nonclinical sample of older mourners, had some predictive value for the change in grief symptoms during a 10-week intervention, and was appreciated as an additional source of mental health information by e-coaches who supported mourners during the intervention. Each evaluation approach in this paper came with its own set of limitations, including (1) skewed class distributions in prediction tasks based on real-life health data and (2) choosing meaningful statistical analyses based on clinical trial designs that are not targeted at evaluating AI tools. However, combining multiple evaluation methods facilitates drawing meaningful conclusions about the clinical value of AI monitoring tools for their intended mental health context.