语言模型代理在评价可视化方面与人类一致吗？实证研究。

IF 1.4 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Computer Graphics and Applications Pub Date : 2025-07-09 DOI:10.1109/MCG.2025.3586461

Zekai Shao, Yi Shan, Yixuan He, Yuxuan Yao, Junhong Wang, Xiaolong Zhang, Yu Zhang, Siming Chen

{"title":"语言模型代理在评价可视化方面与人类一致吗？实证研究。","authors":"Zekai Shao, Yi Shan, Yixuan He, Yuxuan Yao, Junhong Wang, Xiaolong Zhang, Yu Zhang, Siming Chen","doi":"10.1109/MCG.2025.3586461","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds alignment correlates with experts' pre-experiment confidence. The third study tests enhancement techniques like input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototype evaluation and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do Language Model Agents Align with Humans in Rating Visualizations? An Empirical Study.\",\"authors\":\"Zekai Shao, Yi Shan, Yixuan He, Yuxuan Yao, Junhong Wang, Xiaolong Zhang, Yu Zhang, Siming Chen\",\"doi\":\"10.1109/MCG.2025.3586461\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds alignment correlates with experts' pre-experiment confidence. The third study tests enhancement techniques like input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototype evaluation and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.\",\"PeriodicalId\":55026,\"journal\":{\"name\":\"IEEE Computer Graphics and Applications\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Graphics and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/MCG.2025.3586461\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Graphics and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MCG.2025.3586461","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）在理解可视化和捕获设计知识方面显示出潜力。然而，它们预测人类反馈的能力尚不清楚。为了探索这一点，我们进行了三项研究，评估可视化任务中基于llm的代理和人类评级之间的一致性。第一项研究复制了一项以人类为研究对象的研究，显示出智能体在类人类推理和评级方面的良好表现，并为进一步的实验提供了信息。第二项研究模拟了先前使用代理的六项研究，并发现一致性与专家的实验前信心相关。第三项研究测试了输入预处理和知识注入等增强技术，揭示了鲁棒性和潜在偏差的局限性。这些发现表明，在专家评估人员的高置信度假设的指导下，基于法学硕士的代理可以模拟人类评级。我们还演示了快速原型评估中的使用场景，并讨论了未来的发展方向。我们注意到，模拟只能作为补充，不能取代用户研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Do Language Model Agents Align with Humans in Rating Visualizations? An Empirical Study.

Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds alignment correlates with experts' pre-experiment confidence. The third study tests enhancement techniques like input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototype evaluation and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Computer Graphics and Applications 工程技术-计算机：软件工程

CiteScore

3.20

自引率

5.60%

发文量

160

审稿时长

>12 weeks

期刊介绍： IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics, visualization, virtual and augmented reality, and HCI. From specific algorithms to full system implementations, CG&A offers a unique combination of peer-reviewed feature articles and informal departments. Theme issues guest edited by leading researchers in their fields track the latest developments and trends in computer-generated graphical content, while tutorials and surveys provide a broad overview of interesting and timely topics. Regular departments further explore the core areas of graphics as well as extend into topics such as usability, education, history, and opinion. Each issue, the story of our cover focuses on creative applications of the technology by an artist or designer. Published six times a year, CG&A is indispensable reading for people working at the leading edge of computer-generated graphics technology and its applications in everything from business to the arts.