{"title":"Enabling an IoT System of Systems through Auto Sound Synthesis in Silent Video with DNN","authors":"Sanchita Ghose, John J. Prevost","doi":"10.1109/SoSE50414.2020.9130483","DOIUrl":null,"url":null,"abstract":"The Internet of Things has enabled a wide variety of new applications not possible only a short time ago. Sensing data found at the Edge of the network, close to the environment where people are found, is a critical component with many modern applications. Often restrictions at the device level or on the available bandwidth limit the ability to capture all locally available data required for processing and analysis. In this research, we present a novel method for extracting sound from video data where no original sound was present. Our novel method of sound synthesis first uses the image features output from a Convolutional Neural Network (CNN) to determine class prediction weights using an advanced Long Short Term Memory (LSTM) network. A Generative Adversarial Network (GAN) is then used to generate the representative sound of the predicted class for the input video sample. By combining the output of many Auto Sound Generators in a System of Systems framework, we show that new applications emerge that were never before possible.","PeriodicalId":121664,"journal":{"name":"2020 IEEE 15th International Conference of System of Systems Engineering (SoSE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 15th International Conference of System of Systems Engineering (SoSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SoSE50414.2020.9130483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The Internet of Things has enabled a wide variety of new applications not possible only a short time ago. Sensing data found at the Edge of the network, close to the environment where people are found, is a critical component with many modern applications. Often restrictions at the device level or on the available bandwidth limit the ability to capture all locally available data required for processing and analysis. In this research, we present a novel method for extracting sound from video data where no original sound was present. Our novel method of sound synthesis first uses the image features output from a Convolutional Neural Network (CNN) to determine class prediction weights using an advanced Long Short Term Memory (LSTM) network. A Generative Adversarial Network (GAN) is then used to generate the representative sound of the predicted class for the input video sample. By combining the output of many Auto Sound Generators in a System of Systems framework, we show that new applications emerge that were never before possible.