{"title":"一种基于策略的视听融合钢琴转写方法","authors":"Xianke Wang, Wei Xu, Juanting Liu, Weiming Yang, Wenqing Cheng","doi":"10.23919/DAFx51585.2021.9768275","DOIUrl":null,"url":null,"abstract":"Piano transcription is a fundamental problem in the field of music information retrieval. At present, a large number of transcriptional studies are mainly based on audio or video, yet there is a small number of discussion based on audio-visual fusion. In this paper, a piano transcription model based on strategy fusion is proposed, in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1 score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion. The experiment results show that the transcription model based on strategy fusion achieves better results than the one based on feature fusion.","PeriodicalId":221170,"journal":{"name":"2021 24th International Conference on Digital Audio Effects (DAFx)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Audio-Visual Fusion Piano Transcription Approach Based on Strategy\",\"authors\":\"Xianke Wang, Wei Xu, Juanting Liu, Weiming Yang, Wenqing Cheng\",\"doi\":\"10.23919/DAFx51585.2021.9768275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Piano transcription is a fundamental problem in the field of music information retrieval. At present, a large number of transcriptional studies are mainly based on audio or video, yet there is a small number of discussion based on audio-visual fusion. In this paper, a piano transcription model based on strategy fusion is proposed, in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1 score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion. The experiment results show that the transcription model based on strategy fusion achieves better results than the one based on feature fusion.\",\"PeriodicalId\":221170,\"journal\":{\"name\":\"2021 24th International Conference on Digital Audio Effects (DAFx)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 24th International Conference on Digital Audio Effects (DAFx)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/DAFx51585.2021.9768275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Conference on Digital Audio Effects (DAFx)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DAFx51585.2021.9768275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Audio-Visual Fusion Piano Transcription Approach Based on Strategy
Piano transcription is a fundamental problem in the field of music information retrieval. At present, a large number of transcriptional studies are mainly based on audio or video, yet there is a small number of discussion based on audio-visual fusion. In this paper, a piano transcription model based on strategy fusion is proposed, in which the transcription results of the video model are used to assist audio transcription. Due to the lack of datasets currently used for audio-visual fusion, the OMAPS data set is proposed in this paper. Meanwhile, our strategy fusion model achieves a 92.07% F1 score on OMAPS dataset. The transcription model based on feature fusion is also compared with the one based on strategy fusion. The experiment results show that the transcription model based on strategy fusion achieves better results than the one based on feature fusion.