Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT).
Nur Hani Zainal, Regina Eckhardt, Gavin N Rackoff, Ellen E Fitzsimmons-Craft, Elsa Rojas-Ashe, Craig Barr Taylor, Burkhardt Funk, Daniel Eisenberg, Denise E Wilfley, Michelle G Newman
{"title":"Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT).","authors":"Nur Hani Zainal, Regina Eckhardt, Gavin N Rackoff, Ellen E Fitzsimmons-Craft, Elsa Rojas-Ashe, Craig Barr Taylor, Burkhardt Funk, Daniel Eisenberg, Denise E Wilfley, Michelle G Newman","doi":"10.1017/S0033291725000340","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>As the use of guided digitally-delivered cognitive-behavioral therapy (GdCBT) grows, pragmatic analytic tools are needed to evaluate coaches' implementation fidelity.</p><p><strong>Aims: </strong>We evaluated how natural language processing (NLP) and machine learning (ML) methods might automate the monitoring of coaches' implementation fidelity to GdCBT delivered as part of a randomized controlled trial.</p><p><strong>Method: </strong>Coaches served as guides to 6-month GdCBT with 3,381 assigned users with or at risk for anxiety, depression, or eating disorders. CBT-trained and supervised human coders used a rubric to rate the implementation fidelity of 13,529 coach-to-user messages. NLP methods abstracted data from text-based coach-to-user messages, and 11 ML models predicting coach implementation fidelity were evaluated.</p><p><strong>Results: </strong>Inter-rater agreement by human coders was excellent (intra-class correlation coefficient = .980-.992). Coaches achieved behavioral targets at the start of the GdCBT and maintained strong fidelity throughout most subsequent messages. Coaches also avoided prohibited actions (e.g. reinforcing users' avoidance). Sentiment analyses generally indicated a higher frequency of coach-delivered positive than negative sentiment words and predicted coach implementation fidelity with acceptable performance metrics (e.g. area under the receiver operating characteristic curve [AUC] = 74.48%). The final best-performing ML algorithms that included a more comprehensive set of NLP features performed well (e.g. AUC = 76.06%).</p><p><strong>Conclusions: </strong>NLP and ML tools could help clinical supervisors automate monitoring of coaches' implementation fidelity to GdCBT. These tools could maximize allocation of scarce resources by reducing the personnel time needed to measure fidelity, potentially freeing up more time for high-quality clinical care.</p>","PeriodicalId":20891,"journal":{"name":"Psychological Medicine","volume":"55 ","pages":"e106"},"PeriodicalIF":5.9000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1017/S0033291725000340","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: As the use of guided digitally-delivered cognitive-behavioral therapy (GdCBT) grows, pragmatic analytic tools are needed to evaluate coaches' implementation fidelity.
Aims: We evaluated how natural language processing (NLP) and machine learning (ML) methods might automate the monitoring of coaches' implementation fidelity to GdCBT delivered as part of a randomized controlled trial.
Method: Coaches served as guides to 6-month GdCBT with 3,381 assigned users with or at risk for anxiety, depression, or eating disorders. CBT-trained and supervised human coders used a rubric to rate the implementation fidelity of 13,529 coach-to-user messages. NLP methods abstracted data from text-based coach-to-user messages, and 11 ML models predicting coach implementation fidelity were evaluated.
Results: Inter-rater agreement by human coders was excellent (intra-class correlation coefficient = .980-.992). Coaches achieved behavioral targets at the start of the GdCBT and maintained strong fidelity throughout most subsequent messages. Coaches also avoided prohibited actions (e.g. reinforcing users' avoidance). Sentiment analyses generally indicated a higher frequency of coach-delivered positive than negative sentiment words and predicted coach implementation fidelity with acceptable performance metrics (e.g. area under the receiver operating characteristic curve [AUC] = 74.48%). The final best-performing ML algorithms that included a more comprehensive set of NLP features performed well (e.g. AUC = 76.06%).
Conclusions: NLP and ML tools could help clinical supervisors automate monitoring of coaches' implementation fidelity to GdCBT. These tools could maximize allocation of scarce resources by reducing the personnel time needed to measure fidelity, potentially freeing up more time for high-quality clinical care.
期刊介绍:
Now in its fifth decade of publication, Psychological Medicine is a leading international journal in the fields of psychiatry, related aspects of psychology and basic sciences. From 2014, there are 16 issues a year, each featuring original articles reporting key research being undertaken worldwide, together with shorter editorials by distinguished scholars and an important book review section. The journal''s success is clearly demonstrated by a consistently high impact factor.