Zhuojun Gu, Katarina Kjell, H Andrew Schwartz, Oscar Kjell
{"title":"Natural Language Response Formats for Assessing Depression and Worry With Large Language Models: A Sequential Evaluation With Model Pre-Registration.","authors":"Zhuojun Gu, Katarina Kjell, H Andrew Schwartz, Oscar Kjell","doi":"10.1177/10731911251364022","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models can transform individuals' mental health descriptions into scores that correlate with rating scales approaching theoretical upper limits. However, such analyses have combined word- and text responses with little known about their differences. We develop response formats ranging from closed-ended to open-ended: (a) select words from lists, write (b) descriptive words, (c) phrases, or (d) texts. Participants answered questions about their depression/worry using the response formats and related rating scales. Language responses were transformed into word embeddings and trained to rating scales. We compare the validity (concurrent, incremental, face, discriminant, and external validity) and reliability (prospective sample and test-retest reliability) of the response formats. Using the <i>Sequential Evaluation with Model Pre-Registration</i> design, machine-learning models were trained on a development dataset (<i>N</i> = 963), and then <i>pre-registered</i> before tested on a prospective sample (<i>N</i> = 145). The pre-registered models demonstrate strong validity and reliability, yielding high accuracy in the prospective sample (<i>r</i> <i>=</i> .60-.79). Additionally, the models demonstrated external validity to self-reported sick-leave/healthcare visits, where the text-format yielded the strongest correlations (being higher/equal to rating scales for 9 of 12 cases). The overall high validity and reliability across formats suggest the possibility of choosing formats according to clinical needs.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251364022"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessment","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/10731911251364022","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models can transform individuals' mental health descriptions into scores that correlate with rating scales approaching theoretical upper limits. However, such analyses have combined word- and text responses with little known about their differences. We develop response formats ranging from closed-ended to open-ended: (a) select words from lists, write (b) descriptive words, (c) phrases, or (d) texts. Participants answered questions about their depression/worry using the response formats and related rating scales. Language responses were transformed into word embeddings and trained to rating scales. We compare the validity (concurrent, incremental, face, discriminant, and external validity) and reliability (prospective sample and test-retest reliability) of the response formats. Using the Sequential Evaluation with Model Pre-Registration design, machine-learning models were trained on a development dataset (N = 963), and then pre-registered before tested on a prospective sample (N = 145). The pre-registered models demonstrate strong validity and reliability, yielding high accuracy in the prospective sample (r= .60-.79). Additionally, the models demonstrated external validity to self-reported sick-leave/healthcare visits, where the text-format yielded the strongest correlations (being higher/equal to rating scales for 9 of 12 cases). The overall high validity and reliability across formats suggest the possibility of choosing formats according to clinical needs.
期刊介绍:
Assessment publishes articles in the domain of applied clinical assessment. The emphasis of this journal is on publication of information of relevance to the use of assessment measures, including test development, validation, and interpretation practices. The scope of the journal includes research that can inform assessment practices in mental health, forensic, medical, and other applied settings. Papers that focus on the assessment of cognitive and neuropsychological functioning, personality, and psychopathology are invited. Most papers published in Assessment report the results of original empirical research, however integrative review articles and scholarly case studies will also be considered.