{"title":"Visualizing video sounds with sound word animation","authors":"Fangzhou Wang, H. Nagano, K. Kashino, T. Igarashi","doi":"10.1109/ICME.2015.7177422","DOIUrl":null,"url":null,"abstract":"Text captions are important means to provide sound information in videos when the sound is not accessible. However, conventional text captions are far less expressive for non-verbal sounds since they are designed to visualize speech sound. To address this problem, we propose a method for automatically transforming non-verbal video sounds to animated sound words, and positioning them near the sound source objects in the video for visualization. This provides natural visual representation of non-verbal sounds with rich information about the sound category and dynamics. We conducted a user study with over 300 participants using an online crowdsourcing service. The results showed that animated sound words could not only effectively and naturally visualize the dynamics of sound while clarify the position of the sound source, but also contribute to making video watching more enjoyable and increasing the visual impact of the video.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2015.7177422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Text captions are important means to provide sound information in videos when the sound is not accessible. However, conventional text captions are far less expressive for non-verbal sounds since they are designed to visualize speech sound. To address this problem, we propose a method for automatically transforming non-verbal video sounds to animated sound words, and positioning them near the sound source objects in the video for visualization. This provides natural visual representation of non-verbal sounds with rich information about the sound category and dynamics. We conducted a user study with over 300 participants using an online crowdsourcing service. The results showed that animated sound words could not only effectively and naturally visualize the dynamics of sound while clarify the position of the sound source, but also contribute to making video watching more enjoyable and increasing the visual impact of the video.