G. Kapidis, R. Poppe, E. V. Dam, R. Veltkamp, L. Noldus
{"title":"我在哪里?比较CNN和LSTM在自中心视频中的位置分类","authors":"G. Kapidis, R. Poppe, E. V. Dam, R. Veltkamp, L. Noldus","doi":"10.1109/PERCOMW.2018.8480258","DOIUrl":null,"url":null,"abstract":"Egocentric vision is a technology that exists in a variety of fields such as life-logging, sports recording and robot navigation. Plenty of research work focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that locations can be characterized by the presence of specific objects. Our objective is the recognition of locations in egocentric videos that mainly consist of indoor house scenes. We perform an extensive comparison between Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based classification methods that aim at finding the in-house location by classifying the detected objects which are extracted with a state-of-the-art object detector. We show that location classification is affected by the quality of the detected objects, i.e., the false detections among the correct ones in a series of frames, but this effect can be greatly limited by taking into account the temporal structure of the information by using LSTM. Finally, we argue about the potential for useful real-world applications.","PeriodicalId":190096,"journal":{"name":"2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)","volume":"242 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Where Am I? Comparing CNN and LSTM for Location Classification in Egocentric Videos\",\"authors\":\"G. Kapidis, R. Poppe, E. V. Dam, R. Veltkamp, L. Noldus\",\"doi\":\"10.1109/PERCOMW.2018.8480258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Egocentric vision is a technology that exists in a variety of fields such as life-logging, sports recording and robot navigation. Plenty of research work focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that locations can be characterized by the presence of specific objects. Our objective is the recognition of locations in egocentric videos that mainly consist of indoor house scenes. We perform an extensive comparison between Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based classification methods that aim at finding the in-house location by classifying the detected objects which are extracted with a state-of-the-art object detector. We show that location classification is affected by the quality of the detected objects, i.e., the false detections among the correct ones in a series of frames, but this effect can be greatly limited by taking into account the temporal structure of the information by using LSTM. Finally, we argue about the potential for useful real-world applications.\",\"PeriodicalId\":190096,\"journal\":{\"name\":\"2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)\",\"volume\":\"242 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PERCOMW.2018.8480258\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PERCOMW.2018.8480258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Where Am I? Comparing CNN and LSTM for Location Classification in Egocentric Videos
Egocentric vision is a technology that exists in a variety of fields such as life-logging, sports recording and robot navigation. Plenty of research work focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that locations can be characterized by the presence of specific objects. Our objective is the recognition of locations in egocentric videos that mainly consist of indoor house scenes. We perform an extensive comparison between Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based classification methods that aim at finding the in-house location by classifying the detected objects which are extracted with a state-of-the-art object detector. We show that location classification is affected by the quality of the detected objects, i.e., the false detections among the correct ones in a series of frames, but this effect can be greatly limited by taking into account the temporal structure of the information by using LSTM. Finally, we argue about the potential for useful real-world applications.