Jangwon Lee, Bardia Doosti, Yupeng Gu, David Cartledge, David J. Crandall, C. Raphael
{"title":"Observing Pianist Accuracy and Form with Computer Vision","authors":"Jangwon Lee, Bardia Doosti, Yupeng Gu, David Cartledge, David J. Crandall, C. Raphael","doi":"10.1109/WACV.2019.00165","DOIUrl":null,"url":null,"abstract":"We present a first step towards developing an interactive piano tutoring system that can observe a student playing the piano and give feedback about hand movements and musical accuracy. In particular, we have two primary aims: 1) to determine which notes on a piano are being played at any moment in time, 2) to identify which finger is pressing each note. We introduce a novel two-stream convolutional neural network that takes video and audio inputs together for detecting pressed notes and finger presses. We formulate our two problems in terms of multi-task learning and extend a state-of-the-art object detection model to incorporate both audio and visual features. In addition, we introduce a novel finger identification solution based on pressed piano note information. We experimentally confirm that our approach is able to detect pressed piano keys and the piano player's fingers with a high accuracy.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2019.00165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
We present a first step towards developing an interactive piano tutoring system that can observe a student playing the piano and give feedback about hand movements and musical accuracy. In particular, we have two primary aims: 1) to determine which notes on a piano are being played at any moment in time, 2) to identify which finger is pressing each note. We introduce a novel two-stream convolutional neural network that takes video and audio inputs together for detecting pressed notes and finger presses. We formulate our two problems in terms of multi-task learning and extend a state-of-the-art object detection model to incorporate both audio and visual features. In addition, we introduce a novel finger identification solution based on pressed piano note information. We experimentally confirm that our approach is able to detect pressed piano keys and the piano player's fingers with a high accuracy.