Xiulong Liu, Dongdong Liu, Jiuwu Zhang, Tao Gu, Keqiu Li
{"title":"RFID and camera fusion for recognition of human-object interactions","authors":"Xiulong Liu, Dongdong Liu, Jiuwu Zhang, Tao Gu, Keqiu Li","doi":"10.1145/3447993.3483244","DOIUrl":null,"url":null,"abstract":"Recognition of human-object interactions is practically important in various human-centric sensing scenarios such as smart supermarket, factory, and home. This paper proposes an RF-Camera system by fusing RFID and Computer Vision (CV) techniques, which is the first work to recognize the human gestural interactions with physical objects in multi-subject and multi-object scenarios. In RF-Camera, we first propose a dimension reduction method to transform the subject's 3D hand trajectory captured by depth camera to a 2D image, using which the subject's gesture can be recognized. We also propose a method to extract the facial image of target subject from an image that may contain irrelevant subjects, thereby further recognizing his/her identity. Finally, we model the physical movements of the held object's tag and further predict the tag phase data, by comparing which with real phase data of each tag human-object matching can be discovered. When implementing RF-Camera, three technical challenges need to be addressed. (i) To remove noisy data corresponding to irrelevant actions from raw sensing data, we propose a state transition diagram to determine the boundary of effective data. (ii) To predict phase data of the held target tag with unknown hand-tag offset, we quantify target tag trajectory by adding a variable hand-tag vector to captured hand trajectory. (iii) To ensure high reading rates of target tags in tag-dense scenarios, we propose a CV-assisted RFID scheduling method, in which analytics on CV data can help schedule RFID readings. We conduct extensive experiments to evaluate the performance of RF-Camera. Experimental results demonstrate that RF-Camera can recognize the gestural actions, human identity and human-object matching with an average accuracy higher than 90% in most cases.","PeriodicalId":177431,"journal":{"name":"Proceedings of the 27th Annual International Conference on Mobile Computing and Networking","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th Annual International Conference on Mobile Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447993.3483244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Recognition of human-object interactions is practically important in various human-centric sensing scenarios such as smart supermarket, factory, and home. This paper proposes an RF-Camera system by fusing RFID and Computer Vision (CV) techniques, which is the first work to recognize the human gestural interactions with physical objects in multi-subject and multi-object scenarios. In RF-Camera, we first propose a dimension reduction method to transform the subject's 3D hand trajectory captured by depth camera to a 2D image, using which the subject's gesture can be recognized. We also propose a method to extract the facial image of target subject from an image that may contain irrelevant subjects, thereby further recognizing his/her identity. Finally, we model the physical movements of the held object's tag and further predict the tag phase data, by comparing which with real phase data of each tag human-object matching can be discovered. When implementing RF-Camera, three technical challenges need to be addressed. (i) To remove noisy data corresponding to irrelevant actions from raw sensing data, we propose a state transition diagram to determine the boundary of effective data. (ii) To predict phase data of the held target tag with unknown hand-tag offset, we quantify target tag trajectory by adding a variable hand-tag vector to captured hand trajectory. (iii) To ensure high reading rates of target tags in tag-dense scenarios, we propose a CV-assisted RFID scheduling method, in which analytics on CV data can help schedule RFID readings. We conduct extensive experiments to evaluate the performance of RF-Camera. Experimental results demonstrate that RF-Camera can recognize the gestural actions, human identity and human-object matching with an average accuracy higher than 90% in most cases.