Few-Shot Prompting with Vision Language Model for Pain Classification in Infant Cry Sounds.

Anthony McCofie, Abhiram Kandiyana, Peter R Mouton, Yu Sun, Dmitry Goldgof
{"title":"Few-Shot Prompting with Vision Language Model for Pain Classification in Infant Cry Sounds.","authors":"Anthony McCofie, Abhiram Kandiyana, Peter R Mouton, Yu Sun, Dmitry Goldgof","doi":"10.1109/cbms65348.2025.00174","DOIUrl":null,"url":null,"abstract":"<p><p>Accurately detecting pain in infants remains a complex challenge. Conventional deep neural networks used for analyzing infant cry sounds typically demand large labeled datasets, substantial computational power, and often lack interpretability. In this work, we introduce a novel approach that leverages OpenAI's vision-language model, GPT-4(V), combined with mel spectrogram-based representations of infant cries through prompting. This prompting strategy significantly reduces the dependence on large training datasets while enhancing transparency and interpretability. Using the USF-MNPAD-II dataset, our method achieves an accuracy of 83.33% with only 16 training samples, in contrast to the 4,914 samples required in the baseline model. To our knowledge, this represents the first application of few-shot prompting with vision-language models such as GPT-4o for infant pain classification.</p>","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"2025 ","pages":"857-862"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444757/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cbms65348.2025.00174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Accurately detecting pain in infants remains a complex challenge. Conventional deep neural networks used for analyzing infant cry sounds typically demand large labeled datasets, substantial computational power, and often lack interpretability. In this work, we introduce a novel approach that leverages OpenAI's vision-language model, GPT-4(V), combined with mel spectrogram-based representations of infant cries through prompting. This prompting strategy significantly reduces the dependence on large training datasets while enhancing transparency and interpretability. Using the USF-MNPAD-II dataset, our method achieves an accuracy of 83.33% with only 16 training samples, in contrast to the 4,914 samples required in the baseline model. To our knowledge, this represents the first application of few-shot prompting with vision-language models such as GPT-4o for infant pain classification.

基于视觉语言模型的短时提示婴儿哭声疼痛分类。
准确地检测婴儿的疼痛仍然是一个复杂的挑战。用于分析婴儿哭声的传统深度神经网络通常需要大量的标记数据集,大量的计算能力,并且往往缺乏可解释性。在这项工作中,我们引入了一种新的方法,利用OpenAI的视觉语言模型GPT-4(V),结合基于mel谱图的婴儿哭声提示表示。这种提示策略大大减少了对大型训练数据集的依赖,同时提高了透明度和可解释性。使用USF-MNPAD-II数据集,我们的方法仅使用16个训练样本就实现了83.33%的准确率,而基线模型需要4,914个样本。据我们所知,这代表了第一次使用视觉语言模型(如gpt - 40)进行婴儿疼痛分类的几次提示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信