{"title":"基于语音情感分析的智能多功能数字内容生态系统","authors":"A. Iliev, P. Stanchev","doi":"10.1145/3134302.3134342","DOIUrl":null,"url":null,"abstract":"In an attempt to establish an improved service-oriented architecture (SOA) for interoperable and customizable access of digital cultural resources an automatic deterministic technique can potentially lead to the improvement of searching, recommending and personalizing of content. Such technique can be developed in many ways using different means for data search and analysis. This paper focuses on the use of voice and emotion recognition in speech as a main vehicle for delivering an alternative way to develop novel solutions for integrating the loosely connected components that exchange information based on a common data model. The parameters used to construct the feature vectors for analysis carried pitch, temporal and duration information. They were compared to the glottal symmetry extracted from the speech source using inverse filtering. A comparison to their first derivatives was also a subject of investigation in this paper. The speech source was a 100-minute long theatrical play containing four male speakers and was recorder at 8kHz with 16-bit sample resolution. Four emotional states were targeted namely: happy, angry, fear, and neutral. Classification was performed using k-Nearest Neighbor method. Training and testing experiments were performed in three scenarios: 60/40, 70/30 and 80/20 minutes respectively. A close comparison of each feature and its rate of change show that the time-domain features perform better while using lesser computational strain than their first derivative counterparts. Furthermore, a correct recognition rate was achieved of up 95% using the chosen features.","PeriodicalId":131196,"journal":{"name":"Proceedings of the 18th International Conference on Computer Systems and Technologies","volume":"344 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice\",\"authors\":\"A. Iliev, P. Stanchev\",\"doi\":\"10.1145/3134302.3134342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In an attempt to establish an improved service-oriented architecture (SOA) for interoperable and customizable access of digital cultural resources an automatic deterministic technique can potentially lead to the improvement of searching, recommending and personalizing of content. Such technique can be developed in many ways using different means for data search and analysis. This paper focuses on the use of voice and emotion recognition in speech as a main vehicle for delivering an alternative way to develop novel solutions for integrating the loosely connected components that exchange information based on a common data model. The parameters used to construct the feature vectors for analysis carried pitch, temporal and duration information. They were compared to the glottal symmetry extracted from the speech source using inverse filtering. A comparison to their first derivatives was also a subject of investigation in this paper. The speech source was a 100-minute long theatrical play containing four male speakers and was recorder at 8kHz with 16-bit sample resolution. Four emotional states were targeted namely: happy, angry, fear, and neutral. Classification was performed using k-Nearest Neighbor method. Training and testing experiments were performed in three scenarios: 60/40, 70/30 and 80/20 minutes respectively. A close comparison of each feature and its rate of change show that the time-domain features perform better while using lesser computational strain than their first derivative counterparts. Furthermore, a correct recognition rate was achieved of up 95% using the chosen features.\",\"PeriodicalId\":131196,\"journal\":{\"name\":\"Proceedings of the 18th International Conference on Computer Systems and Technologies\",\"volume\":\"344 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th International Conference on Computer Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3134302.3134342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3134302.3134342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice
In an attempt to establish an improved service-oriented architecture (SOA) for interoperable and customizable access of digital cultural resources an automatic deterministic technique can potentially lead to the improvement of searching, recommending and personalizing of content. Such technique can be developed in many ways using different means for data search and analysis. This paper focuses on the use of voice and emotion recognition in speech as a main vehicle for delivering an alternative way to develop novel solutions for integrating the loosely connected components that exchange information based on a common data model. The parameters used to construct the feature vectors for analysis carried pitch, temporal and duration information. They were compared to the glottal symmetry extracted from the speech source using inverse filtering. A comparison to their first derivatives was also a subject of investigation in this paper. The speech source was a 100-minute long theatrical play containing four male speakers and was recorder at 8kHz with 16-bit sample resolution. Four emotional states were targeted namely: happy, angry, fear, and neutral. Classification was performed using k-Nearest Neighbor method. Training and testing experiments were performed in three scenarios: 60/40, 70/30 and 80/20 minutes respectively. A close comparison of each feature and its rate of change show that the time-domain features perform better while using lesser computational strain than their first derivative counterparts. Furthermore, a correct recognition rate was achieved of up 95% using the chosen features.