困惑和接近：大型语言模型困惑补充了语义距离度量来检测不连贯的语音

IF 4.5 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics Pub Date : 2025-08-21 DOI:10.1016/j.jbi.2025.104899

Weizhe Xu , Serguei Pakhomov , Patrick Heagerty , Eric Horvitz , Ellen R. Bradley , Josh Woolley , Andrew Campbell , Alex Cohen , Dror Ben-Zeev , Trevor Cohen

{"title":"困惑和接近：大型语言模型困惑补充了语义距离度量来检测不连贯的语音","authors":"Weizhe Xu , Serguei Pakhomov , Patrick Heagerty , Eric Horvitz , Ellen R. Bradley , Josh Woolley , Andrew Campbell , Alex Cohen , Dror Ben-Zeev , Trevor Cohen","doi":"10.1016/j.jbi.2025.104899","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div><em>Semantic coherence</em> in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics.</div></div><div><h3>Method</h3><div>We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity.</div></div><div><h3>Results</h3><div>The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set.</div></div><div><h3>Conclusion</h3><div>We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM’s potential in enhancing automated diagnosis and monitoring of SSDs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104899"},"PeriodicalIF":4.5000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech\",\"authors\":\"Weizhe Xu , Serguei Pakhomov , Patrick Heagerty , Eric Horvitz , Ellen R. Bradley , Josh Woolley , Andrew Campbell , Alex Cohen , Dror Ben-Zeev , Trevor Cohen\",\"doi\":\"10.1016/j.jbi.2025.104899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div><em>Semantic coherence</em> in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics.</div></div><div><h3>Method</h3><div>We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity.</div></div><div><h3>Results</h3><div>The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set.</div></div><div><h3>Conclusion</h3><div>We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM’s potential in enhancing automated diagnosis and monitoring of SSDs.</div></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\"170 \",\"pages\":\"Article 104899\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1532046425001285\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425001285","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

语言中的语义连贯的特点是思想的逻辑连贯。言语缺乏连贯性可能反映了思维混乱，这是精神分裂症谱系障碍（SSDs）精神病的核心特征。开发有助于自动评估语言语义一致性的工具，可以促进早期发现固态硬盘并改进对症状的监测，从而能够更及时地进行干预。大型语言模型（llm）在许多以语言为中心的任务中表现出了强大的能力，并且由于其固有的语言困惑度量和不连贯叙事经常发生的惊人转变之间的自然契合，在分析语义一致性方面显示出了希望。本研究旨在利用基于llm的困惑度度量开发一种新的语义一致性表示和相关度量，并将该度量与传统的基于向量距离的一致性度量进行比较。方法评价基于LLM困惑度的“袋”和“链”模型作为语义连贯的度量。回归模型使用基于困惑度和接近度特征的单一和配对组合进行训练，以使用标准化工具预测人类对语义连贯的评级。研究人员对表现进行了评估，这些评估来自于一组有精神病症状的人的培训演讲，以及一组被诊断为固态硬盘的患者的临床访谈测试，两者都带有人类对无序思维严重程度的评估标签。结果使用混淆度和接近度特征的组合获得了最佳性能，在训练集的留一交叉验证中，与人类评分的Spearman相关性为0.61（单独使用接近度特征时为0.56），在测试集上为0.54（单独使用接近度特征时为0.52）。结论我们开发了利用LLM困惑度评估语义一致性的新方法，并发现它们是基于接近度的方法的补充。综合起来，这些方法在两个数据集上表现出更高的性能，突出了LLM在增强ssd自动诊断和监控方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech

查看原文本刊更多论文

Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech

Objective

Semantic coherence in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics.

Method

We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity.

Results

The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set.

Conclusion

We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM’s potential in enhancing automated diagnosis and monitoring of SSDs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.