{"title":"Exploiting Stereo Sound Channels to Boost Performance of Neural Network-Based Music Transcription","authors":"Xian Wang, Lingqiao Liu, Javen Qinfeng Shi","doi":"10.1109/ICMLA.2019.00220","DOIUrl":null,"url":null,"abstract":"In recent years deep learning begins to show great potential for automatic music transcription that reproduces MIDI-like music composition information, such as note pitches and onset and offset times, from music recordings. In the literature without exception the two stereo sound channels coming with music recordings were averaged into a single channel to alleviate the computation overhead, which, from an entropy standpoint, definitely sacrifices information. In this paper we propose a method to properly combine the two sound channels for deep learning-based pitch detection. In particular, through modifying the loss function the network is forced to focus on the worse performing sound channel. This method achieves start-of-the-art frame-wise pitch detection performance on the MAPS dataset.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In recent years deep learning begins to show great potential for automatic music transcription that reproduces MIDI-like music composition information, such as note pitches and onset and offset times, from music recordings. In the literature without exception the two stereo sound channels coming with music recordings were averaged into a single channel to alleviate the computation overhead, which, from an entropy standpoint, definitely sacrifices information. In this paper we propose a method to properly combine the two sound channels for deep learning-based pitch detection. In particular, through modifying the loss function the network is forced to focus on the worse performing sound channel. This method achieves start-of-the-art frame-wise pitch detection performance on the MAPS dataset.