Voice-Evoked Color Prediction Using Deep Neural Networks in Sound-Color Synesthesia.

IF 2.7 3区医学 Q3 NEUROSCIENCES

Brain Sciences Pub Date : 2025-05-19 DOI:10.3390/brainsci15050520

Raminta Bartulienė, Aušra Saudargienė, Karolina Reinytė, Gustavas Davidavičius, Rūta Davidavičienė, Šarūnas Ašmantas, Gailius Raškinis, Saulius Šatkauskas

{"title":"Voice-Evoked Color Prediction Using Deep Neural Networks in Sound-Color Synesthesia.","authors":"Raminta Bartulienė, Aušra Saudargienė, Karolina Reinytė, Gustavas Davidavičius, Rūta Davidavičienė, Šarūnas Ašmantas, Gailius Raškinis, Saulius Šatkauskas","doi":"10.3390/brainsci15050520","DOIUrl":null,"url":null,"abstract":"Background/Objectives: Synesthesia is an unusual neurological condition when stimulation of one sensory modality automatically triggers an additional sensory sensation in an additional unstimulated modality. In this study, we investigated a case of sound-color synesthesia in a female with impaired vision. After confirming a positive case of synesthesia, we aimed to determine the sound features that played a key role in the subject's sound perception and color development. Methods: We applied deep neural networks and a benchmark of binary logistic regression to classify blue and pink synesthetically voice-evoked color classes using 136 voice features extracted from eight study participants' voice recordings. Results: The minimum Redundancy Maximum Relevance algorithm was applied to select the 20 most relevant voice features. The recognition accuracy of 0.81 was already achieved using five features, and the best results were obtained utilizing the seventeen most informative features. The deep neural network classified previously unseen voice recordings with 0.84 accuracy, 0.81 specificity, 0.86 sensitivity, and 0.85 and 0.81 F1-scores for blue and pink classes, respectively. The machine learning algorithms revealed that voice parameters, such as Mel-frequency cepstral coefficients, Chroma vectors, and sound energy, play the most significant role. Conclusions: Our results suggest that a person's voice's pitch, tone, and energy affect different color perceptions.","PeriodicalId":9095,"journal":{"name":"Brain Sciences","volume":"15 5","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12110112/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/brainsci15050520","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Background/Objectives: Synesthesia is an unusual neurological condition when stimulation of one sensory modality automatically triggers an additional sensory sensation in an additional unstimulated modality. In this study, we investigated a case of sound-color synesthesia in a female with impaired vision. After confirming a positive case of synesthesia, we aimed to determine the sound features that played a key role in the subject's sound perception and color development. Methods: We applied deep neural networks and a benchmark of binary logistic regression to classify blue and pink synesthetically voice-evoked color classes using 136 voice features extracted from eight study participants' voice recordings. Results: The minimum Redundancy Maximum Relevance algorithm was applied to select the 20 most relevant voice features. The recognition accuracy of 0.81 was already achieved using five features, and the best results were obtained utilizing the seventeen most informative features. The deep neural network classified previously unseen voice recordings with 0.84 accuracy, 0.81 specificity, 0.86 sensitivity, and 0.85 and 0.81 F1-scores for blue and pink classes, respectively. The machine learning algorithms revealed that voice parameters, such as Mel-frequency cepstral coefficients, Chroma vectors, and sound energy, play the most significant role. Conclusions: Our results suggest that a person's voice's pitch, tone, and energy affect different color perceptions.

查看原文本刊更多论文

声音-颜色联觉中使用深度神经网络的声音诱发颜色预测。

背景/目的：联觉是一种不寻常的神经系统疾病，当一种感觉模态的刺激自动触发另一种非刺激模态的额外感觉。在这项研究中，我们调查了一个案例的声音-颜色联觉的女性视力受损。在确认了一个积极的联觉病例后，我们的目标是确定在受试者的声音感知和颜色发展中起关键作用的声音特征。方法：采用深度神经网络和二元逻辑回归基准，从8名研究参与者的录音中提取136个语音特征，对蓝色和粉红色的综合语音诱发颜色类别进行分类。结果：采用最小冗余最大关联算法选择20个最相关的语音特征。利用5个特征的识别精度达到了0.81，利用17个信息量最大的特征获得了最好的识别结果。深度神经网络对以前未见过的录音进行分类，准确率为0.84，特异性为0.81，灵敏度为0.86，蓝色和粉色类别的f1得分分别为0.85和0.81。机器学习算法表明，语音参数，如mel频率倒谱系数，色度向量和声能，发挥了最重要的作用。结论：我们的研究结果表明，一个人的声音的音高、语调和能量会影响不同的颜色感知。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Brain Sciences Neuroscience-General Neuroscience

CiteScore

4.80

自引率

9.10%

发文量

1472

审稿时长

18.71 days

期刊介绍： Brain Sciences (ISSN 2076-3425) is a peer-reviewed scientific journal that publishes original articles, critical reviews, research notes and short communications in the areas of cognitive neuroscience, developmental neuroscience, molecular and cellular neuroscience, neural engineering, neuroimaging, neurolinguistics, neuropathy, systems neuroscience, and theoretical and computational neuroscience. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files or software regarding the full details of the calculation and experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.