基于视觉语言模型的短时提示婴儿哭声疼痛分类。

Proceedings. IEEE International Symposium on Computer-Based Medical Systems Pub Date : 2025-06-01 Epub Date: 2025-07-04 DOI:10.1109/cbms65348.2025.00174

Anthony McCofie, Abhiram Kandiyana, Peter R Mouton, Yu Sun, Dmitry Goldgof

{"title":"基于视觉语言模型的短时提示婴儿哭声疼痛分类。","authors":"Anthony McCofie, Abhiram Kandiyana, Peter R Mouton, Yu Sun, Dmitry Goldgof","doi":"10.1109/cbms65348.2025.00174","DOIUrl":null,"url":null,"abstract":"Accurately detecting pain in infants remains a complex challenge. Conventional deep neural networks used for analyzing infant cry sounds typically demand large labeled datasets, substantial computational power, and often lack interpretability. In this work, we introduce a novel approach that leverages OpenAI's vision-language model, GPT-4(V), combined with mel spectrogram-based representations of infant cries through prompting. This prompting strategy significantly reduces the dependence on large training datasets while enhancing transparency and interpretability. Using the USF-MNPAD-II dataset, our method achieves an accuracy of 83.33% with only 16 training samples, in contrast to the 4,914 samples required in the baseline model. To our knowledge, this represents the first application of few-shot prompting with vision-language models such as GPT-4o for infant pain classification.","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"2025 ","pages":"857-862"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444757/pdf/","citationCount":"0","resultStr":"{\"title\":\"Few-Shot Prompting with Vision Language Model for Pain Classification in Infant Cry Sounds.\",\"authors\":\"Anthony McCofie, Abhiram Kandiyana, Peter R Mouton, Yu Sun, Dmitry Goldgof\",\"doi\":\"10.1109/cbms65348.2025.00174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurately detecting pain in infants remains a complex challenge. Conventional deep neural networks used for analyzing infant cry sounds typically demand large labeled datasets, substantial computational power, and often lack interpretability. In this work, we introduce a novel approach that leverages OpenAI's vision-language model, GPT-4(V), combined with mel spectrogram-based representations of infant cries through prompting. This prompting strategy significantly reduces the dependence on large training datasets while enhancing transparency and interpretability. Using the USF-MNPAD-II dataset, our method achieves an accuracy of 83.33% with only 16 training samples, in contrast to the 4,914 samples required in the baseline model. To our knowledge, this represents the first application of few-shot prompting with vision-language models such as GPT-4o for infant pain classification.\",\"PeriodicalId\":74567,\"journal\":{\"name\":\"Proceedings. IEEE International Symposium on Computer-Based Medical Systems\",\"volume\":\"2025 \",\"pages\":\"857-862\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444757/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Symposium on Computer-Based Medical Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/cbms65348.2025.00174\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cbms65348.2025.00174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

准确地检测婴儿的疼痛仍然是一个复杂的挑战。用于分析婴儿哭声的传统深度神经网络通常需要大量的标记数据集，大量的计算能力，并且往往缺乏可解释性。在这项工作中，我们引入了一种新的方法，利用OpenAI的视觉语言模型GPT-4(V)，结合基于mel谱图的婴儿哭声提示表示。这种提示策略大大减少了对大型训练数据集的依赖，同时提高了透明度和可解释性。使用USF-MNPAD-II数据集，我们的方法仅使用16个训练样本就实现了83.33%的准确率，而基线模型需要4,914个样本。据我们所知，这代表了第一次使用视觉语言模型（如gpt - 40）进行婴儿疼痛分类的几次提示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Few-Shot Prompting with Vision Language Model for Pain Classification in Infant Cry Sounds.

Accurately detecting pain in infants remains a complex challenge. Conventional deep neural networks used for analyzing infant cry sounds typically demand large labeled datasets, substantial computational power, and often lack interpretability. In this work, we introduce a novel approach that leverages OpenAI's vision-language model, GPT-4(V), combined with mel spectrogram-based representations of infant cries through prompting. This prompting strategy significantly reduces the dependence on large training datasets while enhancing transparency and interpretability. Using the USF-MNPAD-II dataset, our method achieves an accuracy of 83.33% with only 16 training samples, in contrast to the 4,914 samples required in the baseline model. To our knowledge, this represents the first application of few-shot prompting with vision-language models such as GPT-4o for infant pain classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE International Symposium on Computer-Based Medical Systems

自引率

0.00%

发文量