{"title":"Hear and See: End-to-end sound classification and visualization of classified sounds","authors":"Thomas Miano","doi":"10.7287/peerj.preprints.27280v1","DOIUrl":null,"url":null,"abstract":"Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"44 1","pages":"e27280"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ preprints","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7287/peerj.preprints.27280v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Machine learning is a field of study that uses computational and statistical techniques to enable computers to learn. When machine learning is applied, it functions as an instrument that can solve problems or expand knowledge about the surrounding world. Increasingly, machine learning is also an instrument for artistic expression in digital and non-digital media. While painted art has existed for thousands of years, the oldest digital art is less than a century old. Digital media as an art form is a relatively nascent, and the practice of machine learning in digital art is even more recent. Across all artistic media, a piece is powerful when it can captivate its consumer. Such captivation can be elicited through through a wide variety of methods including but not limited to distinct technique, emotionally evocative communication, and aesthetically pleasing combinations of textures. This work aims to explore how machine learning can be used simultaneously as a scientific instrument for understanding the world and as an artistic instrument for inspiring awe. Specifically, our goal is to build an end-to-end system that uses modern machine learning techniques to accurately recognize sounds in the natural environment and to communicate via visualization those sounds that it has recognized. We validate existing research by finding that convolutional neural networks, when paired with transfer learning using out-of-domain data, can be successful in mapping an image classification task to a sound classification task. Our work offers a novel application where the model used for performant sound classification is also used for visualization in an end-to-end, sound-to-image system.