基于大语言模型的身体形象评价。

IF 1.3 4区心理学 Q3 PSYCHOLOGY, MULTIDISCIPLINARY

PsyCh journal Pub Date : 2025-08-30 DOI:10.1002/pchj.70048

Fumeng Li, Nan Zhao

{"title":"基于大语言模型的身体形象评价。","authors":"Fumeng Li, Nan Zhao","doi":"10.1002/pchj.70048","DOIUrl":null,"url":null,"abstract":"Assessing adolescent body image is crucial for mental health interventions, yet traditional methods suffer from limited dimensional coverage, poor dynamic tracking, and weak ecological validity. To address these gaps, this study proposes a multidimensional evaluation using large language models (LLMs) and compares its criterion validity against a dictionary-based method and expert ratings. We defined four dimensions-perception, positive attitude, negative attitude, behavior-by reviewing the body-image literature and built a validated dictionary through expert ratings and iterative refinement. A four-step prompt-engineering process, incorporating role-playing and other optimization techniques, produced tailored prompts for LLM-based recognition. To validate these tools, we collected self-reported texts and scale scores from 194 university students, performed semantic analyses with Llama-3.1-70B, Qwen-Max, and DeepSeek-R1 using these prompts, and confirmed ecological validity on social media posts. Results indicate that our multidimensional dictionary correlated significantly with expert ratings across all four dimensions (r = 0.515-0.625), providing a solid benchmark. LLM-based assessments then outperformed both the dictionary and human ratings, with zero-shot LLMs achieving r = 0.664 in positive attitude (vs. expert r = 0.657) and DeepSeek-R1 reaching r = 0.722 in perception. Role-playing techniques significantly improved the validity in the perception dimension (Δr = +0.117). Consistency checks revealed that the DeepSeek model reduced error dispersion in extreme score ranges by 48.4% compared to human ratings, with the 95% consistency limits covering the fluctuations of human scores. Incremental validity analysis showed that LLMs could replace human evaluations in the perception dimension (ΔR2 = 0.220). In ecological validity checks, the Qwen model achieved a correlation of 0.651 in the social media behavior dimension-53.1% higher than the dictionary method. We found that LLMs demonstrated significant advantages in the multidimensional assessment of body image, offering a new intelligent approach to mental health measurement.","PeriodicalId":20804,"journal":{"name":"PsyCh journal","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Assessment of Body Image Based on Large Language Model.\",\"authors\":\"Fumeng Li, Nan Zhao\",\"doi\":\"10.1002/pchj.70048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Assessing adolescent body image is crucial for mental health interventions, yet traditional methods suffer from limited dimensional coverage, poor dynamic tracking, and weak ecological validity. To address these gaps, this study proposes a multidimensional evaluation using large language models (LLMs) and compares its criterion validity against a dictionary-based method and expert ratings. We defined four dimensions-perception, positive attitude, negative attitude, behavior-by reviewing the body-image literature and built a validated dictionary through expert ratings and iterative refinement. A four-step prompt-engineering process, incorporating role-playing and other optimization techniques, produced tailored prompts for LLM-based recognition. To validate these tools, we collected self-reported texts and scale scores from 194 university students, performed semantic analyses with Llama-3.1-70B, Qwen-Max, and DeepSeek-R1 using these prompts, and confirmed ecological validity on social media posts. Results indicate that our multidimensional dictionary correlated significantly with expert ratings across all four dimensions (r = 0.515-0.625), providing a solid benchmark. LLM-based assessments then outperformed both the dictionary and human ratings, with zero-shot LLMs achieving r = 0.664 in positive attitude (vs. expert r = 0.657) and DeepSeek-R1 reaching r = 0.722 in perception. Role-playing techniques significantly improved the validity in the perception dimension (Δr = +0.117). Consistency checks revealed that the DeepSeek model reduced error dispersion in extreme score ranges by 48.4% compared to human ratings, with the 95% consistency limits covering the fluctuations of human scores. Incremental validity analysis showed that LLMs could replace human evaluations in the perception dimension (ΔR2 = 0.220). In ecological validity checks, the Qwen model achieved a correlation of 0.651 in the social media behavior dimension-53.1% higher than the dictionary method. We found that LLMs demonstrated significant advantages in the multidimensional assessment of body image, offering a new intelligent approach to mental health measurement.\",\"PeriodicalId\":20804,\"journal\":{\"name\":\"PsyCh journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2025-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PsyCh journal\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1002/pchj.70048\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PsyCh journal","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1002/pchj.70048","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

青少年身体形象评估对心理健康干预至关重要，但传统方法存在维度覆盖有限、动态跟踪差、生态效度弱等问题。为了解决这些差距，本研究提出了使用大型语言模型（llm）的多维评估，并将其标准有效性与基于字典的方法和专家评级进行比较。通过查阅身体形象文献，我们定义了感知、积极态度、消极态度和行为四个维度，并通过专家评级和迭代改进建立了一个有效的词典。采用四步快速工程流程，结合角色扮演和其他优化技术，为基于llm的识别提供量身定制的提示。为了验证这些工具，我们收集了194名大学生的自述文本和量表分数，使用Llama-3.1-70B、Qwen-Max和DeepSeek-R1进行语义分析，并使用这些提示确认社交媒体帖子的生态有效性。结果表明，我们的多维词典在所有四个维度上都与专家评级显著相关（r = 0.515-0.625），提供了一个可靠的基准。基于llm的评估结果优于字典和人类评分，零命中率llm在积极态度方面达到r = 0.664（相对于专家r = 0.657）， DeepSeek-R1在感知方面达到r = 0.722。角色扮演技术显著提高了感知维度的效度（Δr = +0.117）。一致性检查显示，与人类评分相比，DeepSeek模型在极端分数范围内的误差分散减少了48.4%，95%的一致性限制涵盖了人类评分的波动。增量效度分析显示，llm在感知维度上可以取代人的评价（ΔR2 = 0.220）。在生态效度检验中，Qwen模型在社交媒体行为维度上的相关性为0.651，比字典方法高53.1%。我们发现llm在身体形象的多维评估中表现出显著的优势，为心理健康测量提供了一种新的智能方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Assessment of Body Image Based on Large Language Model.

Assessing adolescent body image is crucial for mental health interventions, yet traditional methods suffer from limited dimensional coverage, poor dynamic tracking, and weak ecological validity. To address these gaps, this study proposes a multidimensional evaluation using large language models (LLMs) and compares its criterion validity against a dictionary-based method and expert ratings. We defined four dimensions-perception, positive attitude, negative attitude, behavior-by reviewing the body-image literature and built a validated dictionary through expert ratings and iterative refinement. A four-step prompt-engineering process, incorporating role-playing and other optimization techniques, produced tailored prompts for LLM-based recognition. To validate these tools, we collected self-reported texts and scale scores from 194 university students, performed semantic analyses with Llama-3.1-70B, Qwen-Max, and DeepSeek-R1 using these prompts, and confirmed ecological validity on social media posts. Results indicate that our multidimensional dictionary correlated significantly with expert ratings across all four dimensions (r = 0.515-0.625), providing a solid benchmark. LLM-based assessments then outperformed both the dictionary and human ratings, with zero-shot LLMs achieving r = 0.664 in positive attitude (vs. expert r = 0.657) and DeepSeek-R1 reaching r = 0.722 in perception. Role-playing techniques significantly improved the validity in the perception dimension (Δr = +0.117). Consistency checks revealed that the DeepSeek model reduced error dispersion in extreme score ranges by 48.4% compared to human ratings, with the 95% consistency limits covering the fluctuations of human scores. Incremental validity analysis showed that LLMs could replace human evaluations in the perception dimension (ΔR² = 0.220). In ecological validity checks, the Qwen model achieved a correlation of 0.651 in the social media behavior dimension-53.1% higher than the dictionary method. We found that LLMs demonstrated significant advantages in the multidimensional assessment of body image, offering a new intelligent approach to mental health measurement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PsyCh journal PSYCHOLOGY, MULTIDISCIPLINARY-

CiteScore

2.70

自引率

12.50%

发文量

109

期刊介绍： PsyCh Journal, China''s first international psychology journal, publishes peer‑reviewed research articles, research reports and integrated research reviews spanning the entire spectrum of scientific psychology and its applications. PsyCh Journal is the flagship journal of the Institute of Psychology, Chinese Academy of Sciences – the only national psychology research institute in China – and reflects the high research standards of the nation. Launched in 2012, PsyCh Journal is devoted to the publication of advanced research exploring basic mechanisms of the human mind and behavior, and delivering scientific knowledge to enhance understanding of culture and society. Towards that broader goal, the Journal will provide a forum for academic exchange and a “knowledge bridge” between China and the World by showcasing high-quality, cutting-edge research related to the science and practice of psychology both within and outside of China. PsyCh Journal features original articles of both empirical and theoretical research in scientific psychology and interdisciplinary sciences, across all levels, from molecular, cellular and system, to individual, group and society. The Journal also publishes evaluative and integrative review papers on any significant research contribution in any area of scientific psychology