Xinran Li , Peilin Huang , Xiaojiang Peng , Feng Sha , Xiaomao Fan , Ye Li
{"title":"TSFNet: A Temporal–Spectral Fusion Network for advanced speech emotion recognition in medical applications","authors":"Xinran Li , Peilin Huang , Xiaojiang Peng , Feng Sha , Xiaomao Fan , Ye Li","doi":"10.1016/j.artmed.2025.103279","DOIUrl":null,"url":null,"abstract":"<div><div>Speech emotion recognition (SER) is a critical component in enhancing communication systems and human–machine interaction, with significant potential for applications in the medical field. Although existing SER methods that combine temporal and spectral features have achieved notable advancements, they still encounter a big challenge in capturing emotional nuances, which are vital in medical diagnostics and patient care. In this study, we introduce a straightforward yet highly efficient network called TSFNet, which is the Temporal–Spectral Fusion Network via a Large-scale Pre-trained Model. This network is specifically designed to effectively process intricate emotional nuances by seamlessly integrating temporal and spectral information present in speech signals. By leveraging the capabilities of a large-scale pre-trained model, which serves as a powerful plug-and-play component for extracting and learning the temporal characteristics of speech, TSFNet enables a more accurate capture of complex emotional details crucial for medical applications. Extensive experiments are conducted on publicly available datasets, to evaluate the performance of TSFNet. Extensive experiments conducted on six public datasets demonstrate that TSFNet significantly outperforms existing baselines, achieving unweighted accuracies of 95.57% for Savee, 92.67% for Crema-D, 85.71% for IEMOCAP, 100.00% for Tess, 95.86% for Emovo, and 80.43% for Meld. It means that TSFNet has the potential in advancing medical diagnostic tools and patient monitoring systems.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"170 ","pages":"Article 103279"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365725002143","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Speech emotion recognition (SER) is a critical component in enhancing communication systems and human–machine interaction, with significant potential for applications in the medical field. Although existing SER methods that combine temporal and spectral features have achieved notable advancements, they still encounter a big challenge in capturing emotional nuances, which are vital in medical diagnostics and patient care. In this study, we introduce a straightforward yet highly efficient network called TSFNet, which is the Temporal–Spectral Fusion Network via a Large-scale Pre-trained Model. This network is specifically designed to effectively process intricate emotional nuances by seamlessly integrating temporal and spectral information present in speech signals. By leveraging the capabilities of a large-scale pre-trained model, which serves as a powerful plug-and-play component for extracting and learning the temporal characteristics of speech, TSFNet enables a more accurate capture of complex emotional details crucial for medical applications. Extensive experiments are conducted on publicly available datasets, to evaluate the performance of TSFNet. Extensive experiments conducted on six public datasets demonstrate that TSFNet significantly outperforms existing baselines, achieving unweighted accuracies of 95.57% for Savee, 92.67% for Crema-D, 85.71% for IEMOCAP, 100.00% for Tess, 95.86% for Emovo, and 80.43% for Meld. It means that TSFNet has the potential in advancing medical diagnostic tools and patient monitoring systems.
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.