一种新的三次样条插值分类方法

IF 2 4区计算机科学 Q2 Computer Science

Intelligent Automation and Soft Computing Pub Date : 2022-01-01 DOI:10.32604/IASC.2022.018045

Husam Ali Abdulmohsin, H. A. Wahab, A. Hossen

{"title":"一种新的三次样条插值分类方法","authors":"Husam Ali Abdulmohsin, H. A. Wahab, A. Hossen","doi":"10.32604/IASC.2022.018045","DOIUrl":null,"url":null,"abstract":"Classification is the last, and usually the most time-consuming step in recognition. Most recently proposed classification algorithms have adopted machine learning (ML) as the main classification approach, regardless of time consumption. This study proposes a statistical feature classification cubic spline interpolation (FC-CSI) algorithm to classify emotions in speech using a curve fitting technique. FC-CSI is utilized in a speech emotion recognition system (SERS). The idea is to sketch the cubic spline interpolation (CSI) for each audio file in a dataset and the mean cubic spline interpolations (MCSIs) representing each emotion in the dataset. CSI interpolation is generated by connecting the features extracted from each file in the feature extraction phase. The MCSI is generated by connecting the mean features of 70% of the files of each emotion in the dataset. Points on the CSI are considered the new generated features. To classify each audio file according to emotion, the Euclidian distance (ED) is found between each CSI and all MCSIs of all emotions in the dataset. Each audio file is classified according to the nearest MCSI to the CSI representing it. The three datasets used in this work are Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Berlin (Emo-DB), and Surrey Audio-Visual Expressed Emotion (SAVEE). The proposed work shows fast classification and high accuracy of results. The classification accuracy, i.e., the proportion of samples assigned to the correct class, using FC-CSI without feature selection (FS), was 69.08%, 92.52%, and 89.1% with RAVDESS, Emo-DB, and SAVEE, respectively. The results of the proposed method were compared to those of a designed neural network called SER-NN. Comparisons were made with and without FS. FC-CSI outperformed SER-NN on Emo-DB and SAVEE, and underperformed on RAVDESS, without using an FS algorithm. It was noticed from experiments that FC-CSI operated faster than the same system utilizing SER-NN.","PeriodicalId":50357,"journal":{"name":"Intelligent Automation and Soft Computing","volume":"31 1","pages":"339-355"},"PeriodicalIF":2.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Novel Classification Method with Cubic Spline Interpolation\",\"authors\":\"Husam Ali Abdulmohsin, H. A. Wahab, A. Hossen\",\"doi\":\"10.32604/IASC.2022.018045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is the last, and usually the most time-consuming step in recognition. Most recently proposed classification algorithms have adopted machine learning (ML) as the main classification approach, regardless of time consumption. This study proposes a statistical feature classification cubic spline interpolation (FC-CSI) algorithm to classify emotions in speech using a curve fitting technique. FC-CSI is utilized in a speech emotion recognition system (SERS). The idea is to sketch the cubic spline interpolation (CSI) for each audio file in a dataset and the mean cubic spline interpolations (MCSIs) representing each emotion in the dataset. CSI interpolation is generated by connecting the features extracted from each file in the feature extraction phase. The MCSI is generated by connecting the mean features of 70% of the files of each emotion in the dataset. Points on the CSI are considered the new generated features. To classify each audio file according to emotion, the Euclidian distance (ED) is found between each CSI and all MCSIs of all emotions in the dataset. Each audio file is classified according to the nearest MCSI to the CSI representing it. The three datasets used in this work are Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Berlin (Emo-DB), and Surrey Audio-Visual Expressed Emotion (SAVEE). The proposed work shows fast classification and high accuracy of results. The classification accuracy, i.e., the proportion of samples assigned to the correct class, using FC-CSI without feature selection (FS), was 69.08%, 92.52%, and 89.1% with RAVDESS, Emo-DB, and SAVEE, respectively. The results of the proposed method were compared to those of a designed neural network called SER-NN. Comparisons were made with and without FS. FC-CSI outperformed SER-NN on Emo-DB and SAVEE, and underperformed on RAVDESS, without using an FS algorithm. It was noticed from experiments that FC-CSI operated faster than the same system utilizing SER-NN.\",\"PeriodicalId\":50357,\"journal\":{\"name\":\"Intelligent Automation and Soft Computing\",\"volume\":\"31 1\",\"pages\":\"339-355\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Automation and Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.32604/IASC.2022.018045\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Automation and Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.32604/IASC.2022.018045","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 1

摘要

分类是识别的最后一步，通常也是最耗时的一步。最近提出的大多数分类算法都采用机器学习(ML)作为主要的分类方法，而不考虑耗时。本研究提出一种统计特征分类三次样条插值(FC-CSI)算法，利用曲线拟合技术对语音中的情绪进行分类。FC-CSI应用于语音情感识别系统(SERS)。这个想法是为数据集中的每个音频文件绘制三次样条插值(CSI)，并为数据集中的每个情感绘制平均三次样条插值(mcsi)。CSI插值是将特征提取阶段从每个文件中提取的特征连接起来生成的。MCSI是通过连接数据集中每种情绪文件的70%的平均特征来生成的。CSI上的点被认为是新生成的特征。为了根据情绪对每个音频文件进行分类，在数据集中找到每个CSI与所有情绪的所有mcsi之间的欧几里德距离(ED)。每个音频文件根据最接近表示它的CSI的MCSI进行分类。本研究使用的三个数据集分别是瑞尔森情感言语和歌曲视听数据库(RAVDESS)、柏林数据库(Emo-DB)和萨里视听表达情感数据库(SAVEE)。该方法分类速度快，结果精度高。使用不带特征选择(FS)的FC-CSI的分类准确率为69.08%，使用RAVDESS、Emo-DB和SAVEE的分类准确率为92.52%，使用SAVEE的分类准确率为89.1%。将该方法的结果与已设计的SER-NN神经网络的结果进行了比较。用FS和不使用FS进行比较。在不使用FS算法的情况下，FC-CSI在Emo-DB和SAVEE上的表现优于SER-NN，而在RAVDESS上的表现较差。从实验中可以看出，FC-CSI比使用SER-NN的相同系统运行速度更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Classification Method with Cubic Spline Interpolation

Classification is the last, and usually the most time-consuming step in recognition. Most recently proposed classification algorithms have adopted machine learning (ML) as the main classification approach, regardless of time consumption. This study proposes a statistical feature classification cubic spline interpolation (FC-CSI) algorithm to classify emotions in speech using a curve fitting technique. FC-CSI is utilized in a speech emotion recognition system (SERS). The idea is to sketch the cubic spline interpolation (CSI) for each audio file in a dataset and the mean cubic spline interpolations (MCSIs) representing each emotion in the dataset. CSI interpolation is generated by connecting the features extracted from each file in the feature extraction phase. The MCSI is generated by connecting the mean features of 70% of the files of each emotion in the dataset. Points on the CSI are considered the new generated features. To classify each audio file according to emotion, the Euclidian distance (ED) is found between each CSI and all MCSIs of all emotions in the dataset. Each audio file is classified according to the nearest MCSI to the CSI representing it. The three datasets used in this work are Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Berlin (Emo-DB), and Surrey Audio-Visual Expressed Emotion (SAVEE). The proposed work shows fast classification and high accuracy of results. The classification accuracy, i.e., the proportion of samples assigned to the correct class, using FC-CSI without feature selection (FS), was 69.08%, 92.52%, and 89.1% with RAVDESS, Emo-DB, and SAVEE, respectively. The results of the proposed method were compared to those of a designed neural network called SER-NN. Comparisons were made with and without FS. FC-CSI outperformed SER-NN on Emo-DB and SAVEE, and underperformed on RAVDESS, without using an FS algorithm. It was noticed from experiments that FC-CSI operated faster than the same system utilizing SER-NN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent Automation and Soft Computing 工程技术-计算机：人工智能

CiteScore

3.50

自引率

10.00%

发文量

429

审稿时长

10.8 months

期刊介绍： An International Journal seeks to provide a common forum for the dissemination of accurate results about the world of intelligent automation, artificial intelligence, computer science, control, intelligent data science, modeling and systems engineering. It is intended that the articles published in the journal will encompass both the short and the long term effects of soft computing and other related fields such as robotics, control, computer, vision, speech recognition, pattern recognition, data mining, big data, data analytics, machine intelligence, cyber security and deep learning. It further hopes it will address the existing and emerging relationships between automation, systems engineering, system of systems engineering and soft computing. The journal will publish original and survey papers on artificial intelligence, intelligent automation and computer engineering with an emphasis on current and potential applications of soft computing. It will have a broad interest in all engineering disciplines, computer science, and related technological fields such as medicine, biology operations research, technology management, agriculture and information technology.