{"title":"Real-time Sound Visualization via Multidimensional Clustering and Projections","authors":"N. Le, Ngan V. T. Nguyen, Tommy Dang","doi":"10.1145/3468784.3471604","DOIUrl":null,"url":null,"abstract":"Sound plays a vital role in every aspect of human life since it is one of the primary sensory information that our auditory system collects and allows us to perceive the world. Sound clustering and visualization is the process of collecting and analyzing audio samples; that process is a prerequisite of sound classification, which is the core of automatic speech recognition, virtual assistants, and text to speech applications. Nevertheless, understanding how to recognize and properly interpret complex, high-dimensional audio data is the most significant challenge in sound clustering and visualization. This paper proposed a web-based platform to visualize and cluster similar sound samples of musical notes and human speech in real-time. For visualizing high-dimensional data like audio, Mel-Frequency Cepstral Coefficients (MFCCs) were initially developed to represent the sounds made by the human vocal tract are extracted. Then, t-distributed Stochastic Neighbor Embedding (t-SNE), a dimensionality reduction technique, was designed for high dimensional datasets is applied. This paper focuses on both data clustering and high-dimensional visualization methods to properly present the clustering results in the most meaningful way to uncover potentially interesting behavioral patterns of musical notes played by different instruments.","PeriodicalId":341589,"journal":{"name":"The 12th International Conference on Advances in Information Technology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 12th International Conference on Advances in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3468784.3471604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Sound plays a vital role in every aspect of human life since it is one of the primary sensory information that our auditory system collects and allows us to perceive the world. Sound clustering and visualization is the process of collecting and analyzing audio samples; that process is a prerequisite of sound classification, which is the core of automatic speech recognition, virtual assistants, and text to speech applications. Nevertheless, understanding how to recognize and properly interpret complex, high-dimensional audio data is the most significant challenge in sound clustering and visualization. This paper proposed a web-based platform to visualize and cluster similar sound samples of musical notes and human speech in real-time. For visualizing high-dimensional data like audio, Mel-Frequency Cepstral Coefficients (MFCCs) were initially developed to represent the sounds made by the human vocal tract are extracted. Then, t-distributed Stochastic Neighbor Embedding (t-SNE), a dimensionality reduction technique, was designed for high dimensional datasets is applied. This paper focuses on both data clustering and high-dimensional visualization methods to properly present the clustering results in the most meaningful way to uncover potentially interesting behavioral patterns of musical notes played by different instruments.