{"title":"基于大语言模型的身体形象评价。","authors":"Fumeng Li, Nan Zhao","doi":"10.1002/pchj.70048","DOIUrl":null,"url":null,"abstract":"<p><p>Assessing adolescent body image is crucial for mental health interventions, yet traditional methods suffer from limited dimensional coverage, poor dynamic tracking, and weak ecological validity. To address these gaps, this study proposes a multidimensional evaluation using large language models (LLMs) and compares its criterion validity against a dictionary-based method and expert ratings. We defined four dimensions-perception, positive attitude, negative attitude, behavior-by reviewing the body-image literature and built a validated dictionary through expert ratings and iterative refinement. A four-step prompt-engineering process, incorporating role-playing and other optimization techniques, produced tailored prompts for LLM-based recognition. To validate these tools, we collected self-reported texts and scale scores from 194 university students, performed semantic analyses with Llama-3.1-70B, Qwen-Max, and DeepSeek-R1 using these prompts, and confirmed ecological validity on social media posts. Results indicate that our multidimensional dictionary correlated significantly with expert ratings across all four dimensions (r = 0.515-0.625), providing a solid benchmark. LLM-based assessments then outperformed both the dictionary and human ratings, with zero-shot LLMs achieving r = 0.664 in positive attitude (vs. expert r = 0.657) and DeepSeek-R1 reaching r = 0.722 in perception. Role-playing techniques significantly improved the validity in the perception dimension (Δr = +0.117). Consistency checks revealed that the DeepSeek model reduced error dispersion in extreme score ranges by 48.4% compared to human ratings, with the 95% consistency limits covering the fluctuations of human scores. Incremental validity analysis showed that LLMs could replace human evaluations in the perception dimension (ΔR<sup>2</sup> = 0.220). In ecological validity checks, the Qwen model achieved a correlation of 0.651 in the social media behavior dimension-53.1% higher than the dictionary method. We found that LLMs demonstrated significant advantages in the multidimensional assessment of body image, offering a new intelligent approach to mental health measurement.</p>","PeriodicalId":20804,"journal":{"name":"PsyCh journal","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Assessment of Body Image Based on Large Language Model.\",\"authors\":\"Fumeng Li, Nan Zhao\",\"doi\":\"10.1002/pchj.70048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Assessing adolescent body image is crucial for mental health interventions, yet traditional methods suffer from limited dimensional coverage, poor dynamic tracking, and weak ecological validity. To address these gaps, this study proposes a multidimensional evaluation using large language models (LLMs) and compares its criterion validity against a dictionary-based method and expert ratings. We defined four dimensions-perception, positive attitude, negative attitude, behavior-by reviewing the body-image literature and built a validated dictionary through expert ratings and iterative refinement. A four-step prompt-engineering process, incorporating role-playing and other optimization techniques, produced tailored prompts for LLM-based recognition. To validate these tools, we collected self-reported texts and scale scores from 194 university students, performed semantic analyses with Llama-3.1-70B, Qwen-Max, and DeepSeek-R1 using these prompts, and confirmed ecological validity on social media posts. Results indicate that our multidimensional dictionary correlated significantly with expert ratings across all four dimensions (r = 0.515-0.625), providing a solid benchmark. LLM-based assessments then outperformed both the dictionary and human ratings, with zero-shot LLMs achieving r = 0.664 in positive attitude (vs. expert r = 0.657) and DeepSeek-R1 reaching r = 0.722 in perception. Role-playing techniques significantly improved the validity in the perception dimension (Δr = +0.117). Consistency checks revealed that the DeepSeek model reduced error dispersion in extreme score ranges by 48.4% compared to human ratings, with the 95% consistency limits covering the fluctuations of human scores. Incremental validity analysis showed that LLMs could replace human evaluations in the perception dimension (ΔR<sup>2</sup> = 0.220). In ecological validity checks, the Qwen model achieved a correlation of 0.651 in the social media behavior dimension-53.1% higher than the dictionary method. We found that LLMs demonstrated significant advantages in the multidimensional assessment of body image, offering a new intelligent approach to mental health measurement.</p>\",\"PeriodicalId\":20804,\"journal\":{\"name\":\"PsyCh journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2025-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PsyCh journal\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1002/pchj.70048\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PsyCh journal","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1002/pchj.70048","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
The Assessment of Body Image Based on Large Language Model.
Assessing adolescent body image is crucial for mental health interventions, yet traditional methods suffer from limited dimensional coverage, poor dynamic tracking, and weak ecological validity. To address these gaps, this study proposes a multidimensional evaluation using large language models (LLMs) and compares its criterion validity against a dictionary-based method and expert ratings. We defined four dimensions-perception, positive attitude, negative attitude, behavior-by reviewing the body-image literature and built a validated dictionary through expert ratings and iterative refinement. A four-step prompt-engineering process, incorporating role-playing and other optimization techniques, produced tailored prompts for LLM-based recognition. To validate these tools, we collected self-reported texts and scale scores from 194 university students, performed semantic analyses with Llama-3.1-70B, Qwen-Max, and DeepSeek-R1 using these prompts, and confirmed ecological validity on social media posts. Results indicate that our multidimensional dictionary correlated significantly with expert ratings across all four dimensions (r = 0.515-0.625), providing a solid benchmark. LLM-based assessments then outperformed both the dictionary and human ratings, with zero-shot LLMs achieving r = 0.664 in positive attitude (vs. expert r = 0.657) and DeepSeek-R1 reaching r = 0.722 in perception. Role-playing techniques significantly improved the validity in the perception dimension (Δr = +0.117). Consistency checks revealed that the DeepSeek model reduced error dispersion in extreme score ranges by 48.4% compared to human ratings, with the 95% consistency limits covering the fluctuations of human scores. Incremental validity analysis showed that LLMs could replace human evaluations in the perception dimension (ΔR2 = 0.220). In ecological validity checks, the Qwen model achieved a correlation of 0.651 in the social media behavior dimension-53.1% higher than the dictionary method. We found that LLMs demonstrated significant advantages in the multidimensional assessment of body image, offering a new intelligent approach to mental health measurement.
期刊介绍:
PsyCh Journal, China''s first international psychology journal, publishes peer‑reviewed research articles, research reports and integrated research reviews spanning the entire spectrum of scientific psychology and its applications. PsyCh Journal is the flagship journal of the Institute of Psychology, Chinese Academy of Sciences – the only national psychology research institute in China – and reflects the high research standards of the nation. Launched in 2012, PsyCh Journal is devoted to the publication of advanced research exploring basic mechanisms of the human mind and behavior, and delivering scientific knowledge to enhance understanding of culture and society. Towards that broader goal, the Journal will provide a forum for academic exchange and a “knowledge bridge” between China and the World by showcasing high-quality, cutting-edge research related to the science and practice of psychology both within and outside of China. PsyCh Journal features original articles of both empirical and theoretical research in scientific psychology and interdisciplinary sciences, across all levels, from molecular, cellular and system, to individual, group and society. The Journal also publishes evaluative and integrative review papers on any significant research contribution in any area of scientific psychology