ArtInsight：一个多模态人工智能框架，用于解释儿童绘画和增强情感理解。

Studies in health technology and informatics Pub Date : 2025-05-15 DOI:10.3233/SHTI250471

Uzair Shah, Naseem Khan, Mahmood Alzubaidi, Marco Agus, Mowafa Househ

{"title":"ArtInsight：一个多模态人工智能框架，用于解释儿童绘画和增强情感理解。","authors":"Uzair Shah, Naseem Khan, Mahmood Alzubaidi, Marco Agus, Mowafa Househ","doi":"10.3233/SHTI250471","DOIUrl":null,"url":null,"abstract":"Recent advancements in multimodal image-to-text models have greatly enhanced the interpretation of children's drawings for emotional understanding purposes. This paper introduces a framework that analyzes these drawings to fully automatically generate detailed reports, covering art descriptions, emotional themes, assessments, and personalized recommendations. Our approach involved annotating 5,000 images by exploiting a Large Language Model (ChatGPT) and by fine-tuning the BLIP (Bootstrapping Language-Image Pre-training) multimodal model. We performed fine-tuning in two steps: 1) we applied Low-Rank Adaptation (LoRA) to the image encoder to preserve its pre-trained features while adapting it to our task, and 2) we refined the text decoder to capture the language patterns needed for comprehensive assessments. The system processes children's artwork as input, using multimodal image-to-text techniques to derive meaningful insights. Although these reports are initial evaluations rather than formal clinical assessments, they provide a valuable starting point for understanding children's emotional and psychological states. This tool can assist art therapists, educators, and parents in gaining a deeper understanding of children's inner worlds. Our research highlights the intersection of artificial intelligence and child psychology, showing how technology can complement human expertise in nurturing children's emotional well-being. By offering a structured, AI-driven analysis of children's drawings, this framework creates new opportunities for early intervention, personalized support, and enhanced communication between children and their caregivers. The impact of this work may extend beyond individual assessments, potentially informing broader strategies in child development, art therapy, and educational practices.","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"327 ","pages":"808-812"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ArtInsight: A Multimodal AI Framework for Interpreting Children's Drawings and Enhancing Emotional Understanding.\",\"authors\":\"Uzair Shah, Naseem Khan, Mahmood Alzubaidi, Marco Agus, Mowafa Househ\",\"doi\":\"10.3233/SHTI250471\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in multimodal image-to-text models have greatly enhanced the interpretation of children's drawings for emotional understanding purposes. This paper introduces a framework that analyzes these drawings to fully automatically generate detailed reports, covering art descriptions, emotional themes, assessments, and personalized recommendations. Our approach involved annotating 5,000 images by exploiting a Large Language Model (ChatGPT) and by fine-tuning the BLIP (Bootstrapping Language-Image Pre-training) multimodal model. We performed fine-tuning in two steps: 1) we applied Low-Rank Adaptation (LoRA) to the image encoder to preserve its pre-trained features while adapting it to our task, and 2) we refined the text decoder to capture the language patterns needed for comprehensive assessments. The system processes children's artwork as input, using multimodal image-to-text techniques to derive meaningful insights. Although these reports are initial evaluations rather than formal clinical assessments, they provide a valuable starting point for understanding children's emotional and psychological states. This tool can assist art therapists, educators, and parents in gaining a deeper understanding of children's inner worlds. Our research highlights the intersection of artificial intelligence and child psychology, showing how technology can complement human expertise in nurturing children's emotional well-being. By offering a structured, AI-driven analysis of children's drawings, this framework creates new opportunities for early intervention, personalized support, and enhanced communication between children and their caregivers. The impact of this work may extend beyond individual assessments, potentially informing broader strategies in child development, art therapy, and educational practices.\",\"PeriodicalId\":94357,\"journal\":{\"name\":\"Studies in health technology and informatics\",\"volume\":\"327 \",\"pages\":\"808-812\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in health technology and informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI250471\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI250471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

多模态图像到文本模型的最新进展极大地增强了对儿童绘画的情感理解目的的解释。本文介绍了一个分析这些图纸的框架，以完全自动生成详细的报告，涵盖艺术描述，情感主题，评估和个性化建议。我们的方法包括通过利用大型语言模型（ChatGPT）和微调BLIP （Bootstrapping Language- image pretraining）多模态模型对5000张图像进行注释。我们分两个步骤进行微调：1)对图像编码器应用低秩自适应（Low-Rank Adaptation, LoRA），以保留其预训练的特征，同时使其适应我们的任务；2)改进文本解码器，以捕获全面评估所需的语言模式。该系统将儿童的艺术作品作为输入进行处理，使用多模态图像到文本的技术来获得有意义的见解。虽然这些报告是初步评估，而不是正式的临床评估，但它们为了解儿童的情绪和心理状态提供了一个有价值的起点。这个工具可以帮助艺术治疗师、教育工作者和父母更深入地了解孩子的内心世界。我们的研究强调了人工智能和儿童心理学的交叉，展示了技术如何在培养儿童情感健康方面补充人类的专业知识。通过提供结构化的、人工智能驱动的儿童绘画分析，该框架为早期干预、个性化支持和加强儿童与其照顾者之间的沟通创造了新的机会。这项工作的影响可能会超越个人评估，潜在地为儿童发展、艺术治疗和教育实践提供更广泛的策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ArtInsight: A Multimodal AI Framework for Interpreting Children's Drawings and Enhancing Emotional Understanding.

Recent advancements in multimodal image-to-text models have greatly enhanced the interpretation of children's drawings for emotional understanding purposes. This paper introduces a framework that analyzes these drawings to fully automatically generate detailed reports, covering art descriptions, emotional themes, assessments, and personalized recommendations. Our approach involved annotating 5,000 images by exploiting a Large Language Model (ChatGPT) and by fine-tuning the BLIP (Bootstrapping Language-Image Pre-training) multimodal model. We performed fine-tuning in two steps: 1) we applied Low-Rank Adaptation (LoRA) to the image encoder to preserve its pre-trained features while adapting it to our task, and 2) we refined the text decoder to capture the language patterns needed for comprehensive assessments. The system processes children's artwork as input, using multimodal image-to-text techniques to derive meaningful insights. Although these reports are initial evaluations rather than formal clinical assessments, they provide a valuable starting point for understanding children's emotional and psychological states. This tool can assist art therapists, educators, and parents in gaining a deeper understanding of children's inner worlds. Our research highlights the intersection of artificial intelligence and child psychology, showing how technology can complement human expertise in nurturing children's emotional well-being. By offering a structured, AI-driven analysis of children's drawings, this framework creates new opportunities for early intervention, personalized support, and enhanced communication between children and their caregivers. The impact of this work may extend beyond individual assessments, potentially informing broader strategies in child development, art therapy, and educational practices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Studies in health technology and informatics

自引率

0.00%

发文量