利用大语言模型和隐马尔可夫模型评估动机访谈的质量。

IF 3.4 2区医学 Q2 PSYCHIATRY

BMC Psychiatry Pub Date : 2025-10-01 DOI:10.1186/s12888-025-07391-1

Kyungho Lim, Young-Chul Jung, Byung-Hoon Kim

{"title":"利用大语言模型和隐马尔可夫模型评估动机访谈的质量。","authors":"Kyungho Lim, Young-Chul Jung, Byung-Hoon Kim","doi":"10.1186/s12888-025-07391-1","DOIUrl":null,"url":null,"abstract":"Background: Motivational Interviewing (MI) is a counseling approach that promotes behavior change by eliciting \"change talk\" and minimizing \"sustain talk.\" Traditional methods for assessing MI quality, such as manual coding, are labor-intensive, subjective, and difficult to scale. This study introduces an automated framework integrating large language models (LLMs) and Hidden Markov Models (HMMs) for evaluation of MI session quality.Aims: This study evaluates the effectiveness of an LLM-HMM framework in predicting MI session quality and examines motivational state transitions in high- and low-quality sessions.Method: A dataset of 40 MI sessions was analyzed. Client utterances were classified and numerically scored by an LLM based on their intention toward or away from change. With HMMs, we used these scores to examine the motivational state transitions across each session. Differences between high- and low-quality sessions were quantified by comparing transition matrices using Frobenius norms. Statistical significance was assessed via a permutation test. Predictive performance was evaluated using logistic regression with leave-one-out cross-validation (LOOCV), where transition matrix elements served as independent variables and interview quality as the dependent variable.Results: High-quality MI sessions exhibited fluid transitions between motivational states, whereas low-quality sessions showed persistence in resistance-oriented states. A statistically significant difference in transition matrices was observed between session groups (p < 0.001). The framework achieved a mean LOOCV accuracy of 0.80, demonstrating strong predictive performance in identifying MI session quality.Conclusions: This study presents a scalable, objective alternative to manual MI evaluation. Future applications may include real-time therapist support, training, and prognosis prediction, pending further validation on field-collected data.","PeriodicalId":9029,"journal":{"name":"BMC Psychiatry","volume":"25 1","pages":"908"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating motivational interview quality using large language models and hidden Markov models.\",\"authors\":\"Kyungho Lim, Young-Chul Jung, Byung-Hoon Kim\",\"doi\":\"10.1186/s12888-025-07391-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Motivational Interviewing (MI) is a counseling approach that promotes behavior change by eliciting \\\"change talk\\\" and minimizing \\\"sustain talk.\\\" Traditional methods for assessing MI quality, such as manual coding, are labor-intensive, subjective, and difficult to scale. This study introduces an automated framework integrating large language models (LLMs) and Hidden Markov Models (HMMs) for evaluation of MI session quality.Aims: This study evaluates the effectiveness of an LLM-HMM framework in predicting MI session quality and examines motivational state transitions in high- and low-quality sessions.Method: A dataset of 40 MI sessions was analyzed. Client utterances were classified and numerically scored by an LLM based on their intention toward or away from change. With HMMs, we used these scores to examine the motivational state transitions across each session. Differences between high- and low-quality sessions were quantified by comparing transition matrices using Frobenius norms. Statistical significance was assessed via a permutation test. Predictive performance was evaluated using logistic regression with leave-one-out cross-validation (LOOCV), where transition matrix elements served as independent variables and interview quality as the dependent variable.Results: High-quality MI sessions exhibited fluid transitions between motivational states, whereas low-quality sessions showed persistence in resistance-oriented states. A statistically significant difference in transition matrices was observed between session groups (p < 0.001). The framework achieved a mean LOOCV accuracy of 0.80, demonstrating strong predictive performance in identifying MI session quality.Conclusions: This study presents a scalable, objective alternative to manual MI evaluation. Future applications may include real-time therapist support, training, and prognosis prediction, pending further validation on field-collected data.\",\"PeriodicalId\":9029,\"journal\":{\"name\":\"BMC Psychiatry\",\"volume\":\"25 1\",\"pages\":\"908\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Psychiatry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12888-025-07391-1\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Psychiatry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12888-025-07391-1","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

摘要

背景：动机性访谈（MI）是一种通过诱导“改变谈话”和最小化“维持谈话”来促进行为改变的咨询方法。评估人工智能质量的传统方法，如手工编码，是劳动密集型的，主观的，并且难以扩展。本研究引入了一个集成大型语言模型（llm）和隐马尔可夫模型（hmm）的自动化框架，用于MI会话质量评估。目的：本研究评估了LLM-HMM框架在预测MI会话质量方面的有效性，并检查了高质量和低质量会话中的动机状态转换。方法：对40个MI会话数据集进行分析。客户的话语被分类并由法学硕士根据他们倾向或远离改变的意图进行数字评分。对于hmm，我们使用这些分数来检查每个会话中的动机状态转换。高质量和低质量会话之间的差异通过使用Frobenius规范比较转移矩阵来量化。通过排列检验评估统计学显著性。预测性能使用逻辑回归与留一交叉验证（LOOCV）进行评估，其中转移矩阵元素作为自变量，访谈质量作为因变量。结果：高质量的心肌梗死在动机状态之间表现出流畅的过渡，而低质量的心肌梗死在阻力导向状态中表现出持久性。在会话组之间观察到转移矩阵的统计显著差异(p)。结论：本研究提出了一种可扩展的、客观的替代人工心肌梗死评估的方法。未来的应用可能包括实时治疗师支持、培训和预后预测，有待于现场收集数据的进一步验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating motivational interview quality using large language models and hidden Markov models.

Background: Motivational Interviewing (MI) is a counseling approach that promotes behavior change by eliciting "change talk" and minimizing "sustain talk." Traditional methods for assessing MI quality, such as manual coding, are labor-intensive, subjective, and difficult to scale. This study introduces an automated framework integrating large language models (LLMs) and Hidden Markov Models (HMMs) for evaluation of MI session quality.

Aims: This study evaluates the effectiveness of an LLM-HMM framework in predicting MI session quality and examines motivational state transitions in high- and low-quality sessions.

Method: A dataset of 40 MI sessions was analyzed. Client utterances were classified and numerically scored by an LLM based on their intention toward or away from change. With HMMs, we used these scores to examine the motivational state transitions across each session. Differences between high- and low-quality sessions were quantified by comparing transition matrices using Frobenius norms. Statistical significance was assessed via a permutation test. Predictive performance was evaluated using logistic regression with leave-one-out cross-validation (LOOCV), where transition matrix elements served as independent variables and interview quality as the dependent variable.

Results: High-quality MI sessions exhibited fluid transitions between motivational states, whereas low-quality sessions showed persistence in resistance-oriented states. A statistically significant difference in transition matrices was observed between session groups (p < 0.001). The framework achieved a mean LOOCV accuracy of 0.80, demonstrating strong predictive performance in identifying MI session quality.

Conclusions: This study presents a scalable, objective alternative to manual MI evaluation. Future applications may include real-time therapist support, training, and prognosis prediction, pending further validation on field-collected data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Psychiatry 医学-精神病学

CiteScore

5.90

自引率

4.50%

发文量

716

审稿时长

3-6 weeks

期刊介绍： BMC Psychiatry is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of psychiatric disorders, as well as related molecular genetics, pathophysiology, and epidemiology.