{"title":"使用递归多状态多传感器估计器的联合音视频目标定位","authors":"Norbert Strobel, S. Spors, R. Rabenstein","doi":"10.1109/ICASSP.2000.859324","DOIUrl":null,"url":null,"abstract":"Object localization based on audio and video information is important for the analysis of dynamic scenes, such as video conferences or traffic situations. In this paper, we view the the dynamic audio-video object localization problem as a joint recursive estimation problem. It is solved using a decentralized Kalman filter fusing both audio and video position estimates. To better take into account different object maneuvers, multiple state-space equations are also incorporated. The result is a recursive multi-state multi-sensor estimator. Experiments show that it yields significantly improved joint position estimates compared to results achieved by using either an audio or a video system only.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Joint audio-video object localization using a recursive multi-state multi-sensor estimator\",\"authors\":\"Norbert Strobel, S. Spors, R. Rabenstein\",\"doi\":\"10.1109/ICASSP.2000.859324\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Object localization based on audio and video information is important for the analysis of dynamic scenes, such as video conferences or traffic situations. In this paper, we view the the dynamic audio-video object localization problem as a joint recursive estimation problem. It is solved using a decentralized Kalman filter fusing both audio and video position estimates. To better take into account different object maneuvers, multiple state-space equations are also incorporated. The result is a recursive multi-state multi-sensor estimator. Experiments show that it yields significantly improved joint position estimates compared to results achieved by using either an audio or a video system only.\",\"PeriodicalId\":164817,\"journal\":{\"name\":\"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2000.859324\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2000.859324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Joint audio-video object localization using a recursive multi-state multi-sensor estimator
Object localization based on audio and video information is important for the analysis of dynamic scenes, such as video conferences or traffic situations. In this paper, we view the the dynamic audio-video object localization problem as a joint recursive estimation problem. It is solved using a decentralized Kalman filter fusing both audio and video position estimates. To better take into account different object maneuvers, multiple state-space equations are also incorporated. The result is a recursive multi-state multi-sensor estimator. Experiments show that it yields significantly improved joint position estimates compared to results achieved by using either an audio or a video system only.