Yuki Matsumura, Atsushi Hashimoto, Shinsuke Mori, M. Mukunoki, M. Minoh
{"title":"Clustering scenes in cooking video guided by object access","authors":"Yuki Matsumura, Atsushi Hashimoto, Shinsuke Mori, M. Mukunoki, M. Minoh","doi":"10.1109/ICMEW.2015.7169812","DOIUrl":null,"url":null,"abstract":"We propose a method in which scenes in a cooking video are clustered for every type of food processing, such as cutting or stir-frying. To extract motion feature, the method first divides the video into segments. The obtained segments are then clustered based on the similarity of the extracted motion feature. The key point is how to divide the video at the first step of the method. Though a simple approach is to divide the video into segments with the same length, this approach cannot deal with the difference of food processing techniques in cooking. Instead, we propose an approach based on object access, namely the moments when a chef picks up or puts down objects. It is expected to obtain segments reflecting such difference. We compare our method with methods using fixed lengths on three cooking videos in the KUSK Dataset, and evaluate the performance for clustering.","PeriodicalId":388471,"journal":{"name":"2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMEW.2015.7169812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We propose a method in which scenes in a cooking video are clustered for every type of food processing, such as cutting or stir-frying. To extract motion feature, the method first divides the video into segments. The obtained segments are then clustered based on the similarity of the extracted motion feature. The key point is how to divide the video at the first step of the method. Though a simple approach is to divide the video into segments with the same length, this approach cannot deal with the difference of food processing techniques in cooking. Instead, we propose an approach based on object access, namely the moments when a chef picks up or puts down objects. It is expected to obtain segments reflecting such difference. We compare our method with methods using fixed lengths on three cooking videos in the KUSK Dataset, and evaluate the performance for clustering.