Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation.

IF 2

JMIR AI Pub Date : 2025-05-30 DOI:10.2196/67369

Ryan Allen Shewcraft, John Schwarz, Mariann Micsinai Balan

{"title":"Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation.","authors":"Ryan Allen Shewcraft, John Schwarz, Mariann Micsinai Balan","doi":"10.2196/67369","DOIUrl":null,"url":null,"abstract":"Background: Language, which is a crucial element of human communication, is influenced by the complex interplay between thoughts, emotions, and experiences. Psychiatric disorders have an impact on cognitive and emotional processes, which in turn affect the content and way individuals with these disorders communicate using language. The recent rapid advancements in large language models (LLMs) suggest that leveraging them for quantitative analysis of language usage has the potential to become a useful method for providing objective measures in diagnosing and monitoring psychiatric conditions by analyzing language patterns.Objective: This study aims to explore the use of LLMs in analyzing spontaneous communication to differentiate between various psychiatric disorders. We seek to show that the latent LLM embedding space identifies distinct linguistic markers that can be used to classify spontaneous communication from 7 different psychiatric disorders.Methods: We used embeddings from the 7 billion parameter Generative Representational Instruction Tuning Language Model to analyze more than 37,000 posts from subreddits dedicated to seven common conditions: schizophrenia, borderline personality disorder (BPD), depression, attention-deficit/hyperactivity disorder (ADHD), anxiety, posttraumatic stress disorder (PTSD) and bipolar disorder. A cross-validated multiclass Extreme Gradient Boosting classifier was trained on these embeddings to predict the origin subreddit for each post. Performance was evaluated using metrics such as precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). In addition, we used Uniform Manifold Approximation and Projection dimensionality reduction to visualize relationships in language between these psychiatric disorders.Results: The 10-fold cross-validated Extreme Gradient Boosting classifier achieved a support-weighted average precision, recall, F1, and accuracy score of 0.73, 0.73, 0.73, and 0.73, respectively. In one-versus-rest tasks, individual category AUCs ranged from 0.89 to 0.97, with a microaverage AUC of 0.95. ADHD posts were classified with the highest AUC of 0.97, indicating distinct linguistic features, while BPD posts had the lowest AUC of 0.89, suggesting greater linguistic overlap with other conditions. Consistent with the classifier results, the ADHD posts have a more visually distinct cluster in the Uniform Manifold Approximation and Projection projects, while BPD overlaps with depression, anxiety, and schizophrenia. Comparisons with other state-of-the-art embedding methods, such as OpenAI's text-embedding-3-small (AUC=0.94) and sentence-bidirectional encoder representations from transformers (AUC=0.86), demonstrated superior performance of the Generative Representational Instruction Tuning Language Model-7B model.Conclusions: This study introduces an innovative use of LLMs in psychiatry, showcasing their potential to objectively examine language use for distinguishing between different psychiatric disorders. The findings highlight the capability of LLMs to offer valuable insights into the linguistic patterns unique to various conditions, paving the way for more efficient, patient-focused diagnostic and monitoring strategies. Future research should aim to validate these results with clinically confirmed populations and investigate the implications of comorbidity and spectrum disorders.","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e67369"},"PeriodicalIF":2.0000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223684/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/67369","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Language, which is a crucial element of human communication, is influenced by the complex interplay between thoughts, emotions, and experiences. Psychiatric disorders have an impact on cognitive and emotional processes, which in turn affect the content and way individuals with these disorders communicate using language. The recent rapid advancements in large language models (LLMs) suggest that leveraging them for quantitative analysis of language usage has the potential to become a useful method for providing objective measures in diagnosing and monitoring psychiatric conditions by analyzing language patterns.

Objective: This study aims to explore the use of LLMs in analyzing spontaneous communication to differentiate between various psychiatric disorders. We seek to show that the latent LLM embedding space identifies distinct linguistic markers that can be used to classify spontaneous communication from 7 different psychiatric disorders.

Methods: We used embeddings from the 7 billion parameter Generative Representational Instruction Tuning Language Model to analyze more than 37,000 posts from subreddits dedicated to seven common conditions: schizophrenia, borderline personality disorder (BPD), depression, attention-deficit/hyperactivity disorder (ADHD), anxiety, posttraumatic stress disorder (PTSD) and bipolar disorder. A cross-validated multiclass Extreme Gradient Boosting classifier was trained on these embeddings to predict the origin subreddit for each post. Performance was evaluated using metrics such as precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). In addition, we used Uniform Manifold Approximation and Projection dimensionality reduction to visualize relationships in language between these psychiatric disorders.

Results: The 10-fold cross-validated Extreme Gradient Boosting classifier achieved a support-weighted average precision, recall, F1, and accuracy score of 0.73, 0.73, 0.73, and 0.73, respectively. In one-versus-rest tasks, individual category AUCs ranged from 0.89 to 0.97, with a microaverage AUC of 0.95. ADHD posts were classified with the highest AUC of 0.97, indicating distinct linguistic features, while BPD posts had the lowest AUC of 0.89, suggesting greater linguistic overlap with other conditions. Consistent with the classifier results, the ADHD posts have a more visually distinct cluster in the Uniform Manifold Approximation and Projection projects, while BPD overlaps with depression, anxiety, and schizophrenia. Comparisons with other state-of-the-art embedding methods, such as OpenAI's text-embedding-3-small (AUC=0.94) and sentence-bidirectional encoder representations from transformers (AUC=0.86), demonstrated superior performance of the Generative Representational Instruction Tuning Language Model-7B model.

Conclusions: This study introduces an innovative use of LLMs in psychiatry, showcasing their potential to objectively examine language use for distinguishing between different psychiatric disorders. The findings highlight the capability of LLMs to offer valuable insights into the linguistic patterns unique to various conditions, paving the way for more efficient, patient-focused diagnostic and monitoring strategies. Future research should aim to validate these results with clinically confirmed populations and investigate the implications of comorbidity and spectrum disorders.

Abstract Image

查看原文本刊更多论文

使用大语言模型嵌入的精神疾病相关自发交流的算法分类：算法开发和验证。

背景：语言是人类交流的重要组成部分，它受到思想、情感和经验之间复杂相互作用的影响。精神疾病对认知和情绪过程有影响，这反过来又影响了这些疾病患者使用语言进行交流的内容和方式。最近，大型语言模型（llm）的快速发展表明，利用它们对语言使用进行定量分析，有可能成为一种有用的方法，通过分析语言模式，为诊断和监测精神疾病提供客观措施。目的：本研究旨在探讨利用llm分析自发性沟通以区分各种精神障碍。我们试图证明潜在的LLM嵌入空间可以识别不同的语言标记，这些标记可以用于对7种不同精神疾病的自发交流进行分类。方法：我们使用来自70亿个参数生成表征指令调整语言模型的嵌入来分析来自子reddit的37,000多条帖子，这些帖子专门针对七种常见疾病：精神分裂症、边缘型人格障碍（BPD）、抑郁症、注意力缺陷/多动障碍（ADHD）、焦虑、创伤后应激障碍（PTSD）和双相情感障碍。在这些嵌入上训练一个交叉验证的多类极端梯度增强分类器来预测每个帖子的起源子reddit。使用诸如精确度、召回率、f1评分和接收者工作特征曲线下面积（AUC）等指标评估性能。此外，我们使用均匀流形近似和投影降维来可视化这些精神障碍之间的语言关系。结果：10倍交叉验证的极端梯度增强分类器实现了支持加权平均精度、召回率、F1和准确率得分分别为0.73、0.73、0.73和0.73。在“一对休息”任务中，个体类别AUC范围为0.89至0.97，微平均AUC为0.95。ADHD帖子的AUC最高，为0.97，表明语言特征明显，而BPD帖子的AUC最低，为0.89，表明与其他情况有较大的语言重叠。与分类器结果一致，ADHD帖子在均匀流形近似和投影项目中具有更明显的视觉聚类，而BPD与抑郁、焦虑和精神分裂症重叠。与其他最先进的嵌入方法，如OpenAI的文本嵌入-3-small （AUC=0.94）和来自变压器的句子双向编码器表示（AUC=0.86）进行比较，证明了生成表示指令调优语言模型- 7b模型的优越性能。结论：本研究介绍了法学硕士在精神病学中的创新应用，展示了他们在客观检查语言使用以区分不同精神疾病方面的潜力。这些发现突出了llm在不同情况下提供独特语言模式的有价值见解的能力，为更有效、以患者为中心的诊断和监测策略铺平了道路。未来的研究应致力于在临床证实的人群中验证这些结果，并调查合并症和谱系障碍的含义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR AI

自引率

0.00%

发文量