Sven Bambach, Stefan Lee, David J Crandall, Chen Yu
{"title":"伸出一只手:在复杂的自我中心互动中检测手和识别活动。","authors":"Sven Bambach, Stefan Lee, David J Crandall, Chen Yu","doi":"10.1109/ICCV.2015.226","DOIUrl":null,"url":null,"abstract":"<p><p>Hands appear very often in egocentric video, and their appearance and pose give important cues about what people are doing and what they are paying attention to. But existing work in hand detection has made strong assumptions that work well in only simple scenarios, such as with limited interaction with other people or in lab settings. We develop methods to locate and distinguish between hands in egocentric video using strong appearance models with Convolutional Neural Networks, and introduce a simple candidate region generation approach that outperforms existing techniques at a fraction of the computational cost. We show how these high-quality bounding boxes can be used to create accurate pixelwise hand regions, and as an application, we investigate the extent to which hand segmentation alone can distinguish between different activities. We evaluate these techniques on a new dataset of 48 first-person videos of people interacting in realistic environments, with pixel-level ground truth for over 15,000 hand instances.</p>","PeriodicalId":74564,"journal":{"name":"Proceedings. IEEE International Conference on Computer Vision","volume":"2015 ","pages":"1949-1957"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICCV.2015.226","citationCount":"359","resultStr":"{\"title\":\"Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions.\",\"authors\":\"Sven Bambach, Stefan Lee, David J Crandall, Chen Yu\",\"doi\":\"10.1109/ICCV.2015.226\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Hands appear very often in egocentric video, and their appearance and pose give important cues about what people are doing and what they are paying attention to. But existing work in hand detection has made strong assumptions that work well in only simple scenarios, such as with limited interaction with other people or in lab settings. We develop methods to locate and distinguish between hands in egocentric video using strong appearance models with Convolutional Neural Networks, and introduce a simple candidate region generation approach that outperforms existing techniques at a fraction of the computational cost. We show how these high-quality bounding boxes can be used to create accurate pixelwise hand regions, and as an application, we investigate the extent to which hand segmentation alone can distinguish between different activities. We evaluate these techniques on a new dataset of 48 first-person videos of people interacting in realistic environments, with pixel-level ground truth for over 15,000 hand instances.</p>\",\"PeriodicalId\":74564,\"journal\":{\"name\":\"Proceedings. IEEE International Conference on Computer Vision\",\"volume\":\"2015 \",\"pages\":\"1949-1957\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/ICCV.2015.226\",\"citationCount\":\"359\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2015.226\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2016/2/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2015.226","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/2/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions.
Hands appear very often in egocentric video, and their appearance and pose give important cues about what people are doing and what they are paying attention to. But existing work in hand detection has made strong assumptions that work well in only simple scenarios, such as with limited interaction with other people or in lab settings. We develop methods to locate and distinguish between hands in egocentric video using strong appearance models with Convolutional Neural Networks, and introduce a simple candidate region generation approach that outperforms existing techniques at a fraction of the computational cost. We show how these high-quality bounding boxes can be used to create accurate pixelwise hand regions, and as an application, we investigate the extent to which hand segmentation alone can distinguish between different activities. We evaluate these techniques on a new dataset of 48 first-person videos of people interacting in realistic environments, with pixel-level ground truth for over 15,000 hand instances.