{"title":"语言模型代理在评价可视化方面与人类一致吗?实证研究。","authors":"Zekai Shao, Yi Shan, Yixuan He, Yuxuan Yao, Junhong Wang, Xiaolong Zhang, Yu Zhang, Siming Chen","doi":"10.1109/MCG.2025.3586461","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds alignment correlates with experts' pre-experiment confidence. The third study tests enhancement techniques like input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototype evaluation and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do Language Model Agents Align with Humans in Rating Visualizations? An Empirical Study.\",\"authors\":\"Zekai Shao, Yi Shan, Yixuan He, Yuxuan Yao, Junhong Wang, Xiaolong Zhang, Yu Zhang, Siming Chen\",\"doi\":\"10.1109/MCG.2025.3586461\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds alignment correlates with experts' pre-experiment confidence. The third study tests enhancement techniques like input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototype evaluation and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.</p>\",\"PeriodicalId\":55026,\"journal\":{\"name\":\"IEEE Computer Graphics and Applications\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Graphics and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/MCG.2025.3586461\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Graphics and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MCG.2025.3586461","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Do Language Model Agents Align with Humans in Rating Visualizations? An Empirical Study.
Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds alignment correlates with experts' pre-experiment confidence. The third study tests enhancement techniques like input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototype evaluation and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.
期刊介绍:
IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics, visualization, virtual and augmented reality, and HCI. From specific algorithms to full system implementations, CG&A offers a unique combination of peer-reviewed feature articles and informal departments. Theme issues guest edited by leading researchers in their fields track the latest developments and trends in computer-generated graphical content, while tutorials and surveys provide a broad overview of interesting and timely topics. Regular departments further explore the core areas of graphics as well as extend into topics such as usability, education, history, and opinion. Each issue, the story of our cover focuses on creative applications of the technology by an artist or designer. Published six times a year, CG&A is indispensable reading for people working at the leading edge of computer-generated graphics technology and its applications in everything from business to the arts.