构建儿童科学绘画规范：基于大型语言模型语义相似度的分布特征。

IF 1.3 Q3 BIOCHEMICAL RESEARCH METHODS

Biology Methods and Protocols Pub Date : 2025-08-11 eCollection Date: 2025-01-01 DOI:10.1093/biomethods/bpaf062

Yi Zhang, Fan Wei, Jingyi Li, Yan Wang, Yanyan Yu, Jianli Chen, Zipo Cai, Xinyu Liu, Wei Wang, Sensen Yao, Peng Wang, Zhong Wang

{"title":"构建儿童科学绘画规范：基于大型语言模型语义相似度的分布特征。","authors":"Yi Zhang, Fan Wei, Jingyi Li, Yan Wang, Yanyan Yu, Jianli Chen, Zipo Cai, Xinyu Liu, Wei Wang, Sensen Yao, Peng Wang, Zhong Wang","doi":"10.1093/biomethods/bpaf062","DOIUrl":null,"url":null,"abstract":"The use of children's drawings to examining their conceptual understanding has been proven to be an effective method, but there are two major problems with previous research: (i) The content of the drawings heavily relies on the task, and the ecological validity of the conclusions is low. (ii) The interpretation of drawings relies too much on the subjective feelings of the researchers. To address this issue, this study uses the Large Language Model (LLM) to identify 1420 children's scientific drawings (covering nine scientific themes/concepts) and uses the word2vec algorithm to calculate their semantic similarity. The study explores whether there are consistent drawing representations for children on the same theme and attempts to establish a norm for children's scientific drawings, providing a baseline reference for follow-up children's drawing research. The results show that the representation of most drawings has consistency, manifested as most semantic similarity >0.8. At the same time, it was found that the consistency of the representation is independent of the accuracy (of LLM's recognition), indicating the existence of consistency bias. In the subsequent exploration of influencing factors, we used Kendall rank correlation coefficient to investigate the effects of \"sample size,\" \"abstract degree,\" and \"focus points\" on drawings and used word frequency statistics to explore whether children represented abstract themes/concepts by reproducing what was taught in class. It was found that accuracy (of LLM's recognition) is the most sensitive indicator, and data such as sample size and semantic similarity are related to it. The consistency between classroom experiments and teaching purpose is also an important factor, many students focus more on the experiments themselves rather than what they explain. In addition, most children tend to use examples they have seen in class to represent more abstract themes/concepts, indicating that they may need concrete examples to understand abstract things.","PeriodicalId":36528,"journal":{"name":"Biology Methods and Protocols","volume":"10 1","pages":"bpaf062"},"PeriodicalIF":1.3000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12380450/pdf/","citationCount":"0","resultStr":"{\"title\":\"Constructing a norm for children's scientific drawing: Distribution features based on semantic similarity of large language models.\",\"authors\":\"Yi Zhang, Fan Wei, Jingyi Li, Yan Wang, Yanyan Yu, Jianli Chen, Zipo Cai, Xinyu Liu, Wei Wang, Sensen Yao, Peng Wang, Zhong Wang\",\"doi\":\"10.1093/biomethods/bpaf062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of children's drawings to examining their conceptual understanding has been proven to be an effective method, but there are two major problems with previous research: (i) The content of the drawings heavily relies on the task, and the ecological validity of the conclusions is low. (ii) The interpretation of drawings relies too much on the subjective feelings of the researchers. To address this issue, this study uses the Large Language Model (LLM) to identify 1420 children's scientific drawings (covering nine scientific themes/concepts) and uses the word2vec algorithm to calculate their semantic similarity. The study explores whether there are consistent drawing representations for children on the same theme and attempts to establish a norm for children's scientific drawings, providing a baseline reference for follow-up children's drawing research. The results show that the representation of most drawings has consistency, manifested as most semantic similarity >0.8. At the same time, it was found that the consistency of the representation is independent of the accuracy (of LLM's recognition), indicating the existence of consistency bias. In the subsequent exploration of influencing factors, we used Kendall rank correlation coefficient to investigate the effects of \\\"sample size,\\\" \\\"abstract degree,\\\" and \\\"focus points\\\" on drawings and used word frequency statistics to explore whether children represented abstract themes/concepts by reproducing what was taught in class. It was found that accuracy (of LLM's recognition) is the most sensitive indicator, and data such as sample size and semantic similarity are related to it. The consistency between classroom experiments and teaching purpose is also an important factor, many students focus more on the experiments themselves rather than what they explain. In addition, most children tend to use examples they have seen in class to represent more abstract themes/concepts, indicating that they may need concrete examples to understand abstract things.\",\"PeriodicalId\":36528,\"journal\":{\"name\":\"Biology Methods and Protocols\",\"volume\":\"10 1\",\"pages\":\"bpaf062\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12380450/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biology Methods and Protocols\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/biomethods/bpaf062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology Methods and Protocols","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/biomethods/bpaf062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

使用儿童的图画来检验他们的概念理解已被证明是一种有效的方法，但以往的研究存在两个主要问题：(1)图画的内容严重依赖于任务，结论的生态效度较低。（ii）对图画的解读过于依赖研究者的主观感受。为了解决这一问题，本研究使用大语言模型（LLM）对1420幅儿童科学绘画（涵盖9个科学主题/概念）进行识别，并使用word2vec算法计算其语义相似度。本研究探讨儿童在同一主题上是否存在一致的绘画表征，试图建立儿童科学绘画的规范，为后续儿童绘画研究提供基线参考。结果表明，大多数图的表示具有一致性，表现为大多数语义相似度>0.8。同时，我们发现表征的一致性与（LLM识别的）准确性无关，表明存在一致性偏差。在随后的影响因素探索中，我们使用肯德尔秩相关系数来研究“样本量”、“抽象程度”和“焦点”对绘画的影响，并使用词频统计来探索儿童是否通过再现课堂上所教的内容来代表抽象主题/概念。研究发现，LLM识别的准确率是最敏感的指标，样本量、语义相似度等数据与之相关。课堂实验与教学目的的一致性也是一个重要因素，许多学生更关注实验本身，而不是实验所解释的内容。此外，大多数孩子倾向于用他们在课堂上看到的例子来代表更抽象的主题/概念，这表明他们可能需要具体的例子来理解抽象的事物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Constructing a norm for children's scientific drawing: Distribution features based on semantic similarity of large language models.

查看原文本刊更多论文

Constructing a norm for children's scientific drawing: Distribution features based on semantic similarity of large language models.

The use of children's drawings to examining their conceptual understanding has been proven to be an effective method, but there are two major problems with previous research: (i) The content of the drawings heavily relies on the task, and the ecological validity of the conclusions is low. (ii) The interpretation of drawings relies too much on the subjective feelings of the researchers. To address this issue, this study uses the Large Language Model (LLM) to identify 1420 children's scientific drawings (covering nine scientific themes/concepts) and uses the word2vec algorithm to calculate their semantic similarity. The study explores whether there are consistent drawing representations for children on the same theme and attempts to establish a norm for children's scientific drawings, providing a baseline reference for follow-up children's drawing research. The results show that the representation of most drawings has consistency, manifested as most semantic similarity >0.8. At the same time, it was found that the consistency of the representation is independent of the accuracy (of LLM's recognition), indicating the existence of consistency bias. In the subsequent exploration of influencing factors, we used Kendall rank correlation coefficient to investigate the effects of "sample size," "abstract degree," and "focus points" on drawings and used word frequency statistics to explore whether children represented abstract themes/concepts by reproducing what was taught in class. It was found that accuracy (of LLM's recognition) is the most sensitive indicator, and data such as sample size and semantic similarity are related to it. The consistency between classroom experiments and teaching purpose is also an important factor, many students focus more on the experiments themselves rather than what they explain. In addition, most children tend to use examples they have seen in class to represent more abstract themes/concepts, indicating that they may need concrete examples to understand abstract things.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biology Methods and Protocols Agricultural and Biological Sciences-Agricultural and Biological Sciences (all)

CiteScore

3.80

自引率

2.80%

发文量

审稿时长

19 weeks