Few-Shot Prompting with Vision Language Model for Pain Classification in Infant Cry Sounds.

Proceedings. IEEE International Symposium on Computer-Based Medical Systems Pub Date : 2025-06-01 Epub Date: 2025-07-04 DOI:10.1109/cbms65348.2025.00174

Anthony McCofie, Abhiram Kandiyana, Peter R Mouton, Yu Sun, Dmitry Goldgof

引用次数: 0

Abstract

Accurately detecting pain in infants remains a complex challenge. Conventional deep neural networks used for analyzing infant cry sounds typically demand large labeled datasets, substantial computational power, and often lack interpretability. In this work, we introduce a novel approach that leverages OpenAI's vision-language model, GPT-4(V), combined with mel spectrogram-based representations of infant cries through prompting. This prompting strategy significantly reduces the dependence on large training datasets while enhancing transparency and interpretability. Using the USF-MNPAD-II dataset, our method achieves an accuracy of 83.33% with only 16 training samples, in contrast to the 4,914 samples required in the baseline model. To our knowledge, this represents the first application of few-shot prompting with vision-language models such as GPT-4o for infant pain classification.

查看原文本刊更多论文

基于视觉语言模型的短时提示婴儿哭声疼痛分类。

准确地检测婴儿的疼痛仍然是一个复杂的挑战。用于分析婴儿哭声的传统深度神经网络通常需要大量的标记数据集，大量的计算能力，并且往往缺乏可解释性。在这项工作中，我们引入了一种新的方法，利用OpenAI的视觉语言模型GPT-4(V)，结合基于mel谱图的婴儿哭声提示表示。这种提示策略大大减少了对大型训练数据集的依赖，同时提高了透明度和可解释性。使用USF-MNPAD-II数据集，我们的方法仅使用16个训练样本就实现了83.33%的准确率，而基线模型需要4,914个样本。据我们所知，这代表了第一次使用视觉语言模型（如gpt - 40）进行婴儿疼痛分类的几次提示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE International Symposium on Computer-Based Medical Systems

自引率

0.00%

发文量