{"title":"网络歌曲和演讲中四种情感模式的分类","authors":"Chien-Hung Chen, Ping-Tsung Lu, O. Chen","doi":"10.1109/WOCC.2010.5510629","DOIUrl":null,"url":null,"abstract":"The amount of multimedia sources from websites is extremely growing up every day. How to effectively search data and to find out what we need becomes a critical issue. In this work, four affective modes of exciting/happy, angry, sad and calm in songs and speeches are investigated. A song clip is partitioned into the main and refrain parts each of which is analyzed by the tempo, normalized intensity mean and rhythm regularity. In a speech clip, the standard deviation of fundamental frequencies, the standard deviation of pauses and the mean of zero crossing rates are computed to understand a speaker's emotion. Particularly, the Gaussian mixture model is built and used for classification. In our experimental results, the averaged accuracies associated with the main and refrain parts of songs, and speeches can be 55%, 60% and 80%, respectively. Therefore, the method proposed herein can be employed to analyze songs and speeches downloaded from websites, and then provide emotion information to a user.","PeriodicalId":427398,"journal":{"name":"The 19th Annual Wireless and Optical Communications Conference (WOCC 2010)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Classification of four affective modes in online songs and speeches\",\"authors\":\"Chien-Hung Chen, Ping-Tsung Lu, O. Chen\",\"doi\":\"10.1109/WOCC.2010.5510629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of multimedia sources from websites is extremely growing up every day. How to effectively search data and to find out what we need becomes a critical issue. In this work, four affective modes of exciting/happy, angry, sad and calm in songs and speeches are investigated. A song clip is partitioned into the main and refrain parts each of which is analyzed by the tempo, normalized intensity mean and rhythm regularity. In a speech clip, the standard deviation of fundamental frequencies, the standard deviation of pauses and the mean of zero crossing rates are computed to understand a speaker's emotion. Particularly, the Gaussian mixture model is built and used for classification. In our experimental results, the averaged accuracies associated with the main and refrain parts of songs, and speeches can be 55%, 60% and 80%, respectively. Therefore, the method proposed herein can be employed to analyze songs and speeches downloaded from websites, and then provide emotion information to a user.\",\"PeriodicalId\":427398,\"journal\":{\"name\":\"The 19th Annual Wireless and Optical Communications Conference (WOCC 2010)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 19th Annual Wireless and Optical Communications Conference (WOCC 2010)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WOCC.2010.5510629\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 19th Annual Wireless and Optical Communications Conference (WOCC 2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WOCC.2010.5510629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Classification of four affective modes in online songs and speeches
The amount of multimedia sources from websites is extremely growing up every day. How to effectively search data and to find out what we need becomes a critical issue. In this work, four affective modes of exciting/happy, angry, sad and calm in songs and speeches are investigated. A song clip is partitioned into the main and refrain parts each of which is analyzed by the tempo, normalized intensity mean and rhythm regularity. In a speech clip, the standard deviation of fundamental frequencies, the standard deviation of pauses and the mean of zero crossing rates are computed to understand a speaker's emotion. Particularly, the Gaussian mixture model is built and used for classification. In our experimental results, the averaged accuracies associated with the main and refrain parts of songs, and speeches can be 55%, 60% and 80%, respectively. Therefore, the method proposed herein can be employed to analyze songs and speeches downloaded from websites, and then provide emotion information to a user.