{"title":"An integrated architecture for surveillance and monitoring in an archaeological site","authors":"E. Ardizzone, M. Cascia, G. Re, M. Ortolani","doi":"10.1145/1099396.1099413","DOIUrl":"https://doi.org/10.1145/1099396.1099413","url":null,"abstract":"This paper describes an on-going work aimed at designing and deploying a system for the surveillance and monitoring of an archaeological site, namely the \"Valley of the Temples\" in Agrigento, Italy. Given the relevance of the site from an artistical and historical point of view, it is important to protect the monuments from malicious or simply incautious behavior; however, the vastity of the area to be monitored and the vague definition of its boundaries make it unpractical to provide extensive coverage through traditional sensors or similar devices. We describe the design of an architecture for the surveillance of the site and for the monitoring of the visitors' behavior consisting in an integrated framework of networked sensors and cameras. Information will be collected by a minimal set of cameras deployed only at critical spots and coupled with higher-performance wireless sensor nodes. Both sets of devices will be supported by more densely deployed lower-cost wireless sensor so that the system will fulfill the concurrent goals of being minimally intrusive and remaining both responsive and efficient. Sensed data will be processed locally whenever possible and convenient, or otherwise sent to a central intelligent unit that will perform further and more sophisticated analyses using a reasoning system, will infer a higher level representation of the outdoor environment, and finally will be able to fine-tune the action of remote devices.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128535979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Aggarwal, R. Cucchiara, E. Chang, Yuan-fang Wang
{"title":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","authors":"J. Aggarwal, R. Cucchiara, E. Chang, Yuan-fang Wang","doi":"10.1145/1099396","DOIUrl":"https://doi.org/10.1145/1099396","url":null,"abstract":"Welcome to the 3rd ACM Workshop on Video Surveillance & Sensor Networks -- VSSN'05. Thank you for your interest and participation. This workshop is continuation of earlier workshops in Berkeley CA 2003, addressed video surveillance only, and New York NY, 2004 that extended the focus on sensor networks. Indeed, there is a large interest in developing and deploying large scale network of sensors -- optical sensor including cameras or non-optical sensors including electrical, thermal, chemical and biological, for surveillance.Following the success of the past events, the current workshop explores the state-of-the art of research activity in multi-camera, distributed video surveillance systems, sensor networks and their integration. The aim is to create the infrastructure for new platforms of multimedia surveillance systems that integrate multiple source and multimedia data (video, audio, digital sensor's signals..., annotated and textual knowledge) to improve the effectiveness of surveillance systems.The call for papers attracted submissions from a number of countries including USA, Canada, UK, Italy, Finland, Ireland, Germany, Republic of Singapore, Hong Kong. The program committee accepted 15 Papers. The workshop topics include algorithms for image and video processing for surveillance, covering camera calibration, mosaicing, background creation, trajectory classification; architecture of large integrated multi-camera systems and sensor networks for different applications including indoor and outdoor surveillance, public parks monitoring, archaeological site controls, virtual reality for multi-camera system simulation, model and algorithms for coordination of fixed and active cameras. Special attention is paid for PTZ cameras, their scheduling and calibration, and information fusion. Several researchers explore future directions in integration of biometric and surveillance paradigms, active camera control for people detection tracking and identification.Mubarak Shah is the invited speaker. The title of his talk is \"Recognizing Human Actions.\" It will discuss general issues of computer vision for human actions detection and interpretation.In addition, this workshop includes a competition of open source algorithms for foreground/background segmentation. Four approaches and their C++ implementation will be compared against the existing code available in OpenCV libraries.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"67 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120924479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian background modeling for foreground detection","authors":"F. Porikli, Oncel Tuzel","doi":"10.1145/1099396.1099407","DOIUrl":"https://doi.org/10.1145/1099396.1099407","url":null,"abstract":"We propose a Bayesian learning method to capture the background statistics of a dynamic scene. We model each pixel as a set of layered normal distributions that compete with each other. Using a recursive Bayesian learning mechanism, we estimate not only the mean and variance but also the probability distribution of the mean and covariance of each model. This learning algorithm preserves the multimodality of the background process and is capable of estimating the number of required layers to represent each pixel.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133605117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic pan-tilt-zoom calibration in the presence of hybrid sensor networks","authors":"C. Wren, U. M. Erdem, A. Azarbayejani","doi":"10.1145/1099396.1099418","DOIUrl":"https://doi.org/10.1145/1099396.1099418","url":null,"abstract":"Wide-area context awareness is a crucial enabling technology for next generation smart buildings and surveillance systems. It is not practical to cover an entire building with cameras, however it is difficult to infer missing information when there are significant gaps in coverage. As a solution, we advocate a class of hybrid perceptual systems that builds a comprehensive model of activity in a large space, such as a building, by merging contextual information from a dense network of ultra-lightweight sensor nodes with video from a sparse network of high-capability sensors. In this paper we explore the task of automatically recovering the relative geometry between a pan-tilt-zoom camera and a network of one-bit motion detectors. We present results for the recovery of geometry alone, and also recovery of geometry jointly with simple activity models. Because we don't believe a metric calibration is necessary, or even entirely useful for this task, we formulate and pursue the novel goal we term functional calibration. Functional calibration is the blending of geometry estimation and simple behavioral model discovery. Accordingly, results are evaluated in terms of the ability of the system to automatically foveate targets in a large, non-convex space, not in terms of pixel reconstruction error.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122064948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards event detection in an audio-based sensor network","authors":"A. Smeaton, Mike McHugh","doi":"10.1145/1099396.1099414","DOIUrl":"https://doi.org/10.1145/1099396.1099414","url":null,"abstract":"In this paper, we describe an experiment where we gathered audio information from a series of conventional wired microphones installed in a typical university setting. We also obtained visual information from cameras located in the same area. We set out to see if audio analysis could be used to assist our existing visual event detection system, and to note any improvements. We were not concerned with identifying or classifying what was detected in the audio. Our aim was to keep audio processing to a minimum, as this would enable wireless sensor networks to be used in the future. We present the results of analysis of audio information based on the mean of the volume, the zero-crossing rate, and the frequency. We found that detecting events based on their volume returned satisfactory results.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116321282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimedia surveillance systems","authors":"R. Cucchiara","doi":"10.1145/1099396.1099399","DOIUrl":"https://doi.org/10.1145/1099396.1099399","url":null,"abstract":"The integration of video technology and sensor networks constitutes the fundamental infrastructure for new generations of multimedia surveillance systems, where many different media streams (audio, video, images, textual data, sensor signals) will concur to provide an automatic analysis of the controlled environment and a real-time interpretation of the scene. New solutions can be devised to enlarge the view of traditional surveillance systems by means of distributed architectures with fixed and active cameras, to enhance their view with other sensed data, to explore multi-resolution views with zooming and omnidirectional cameras. Applications regard surveillance of wide indoor and outdoor area and particularly people surveillance: in this case, multimedia surveillance systems can be enriched with biometric technology; the best views of detected persons and their extracted visual features (e.g. faces, voices, trajectories) can be exploited for people identification.VSSN05 is the third edition of the workshop, co-located at ACM Multimedia Conference, that embraces research reports on video surveillance and, since the edition of 2004, sensor networks. This paper gives a short overview of the hot topics in multimedia surveillance systems and introduces some research activities currently engaged in the world and presented at VSSN05.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126199136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An image mosaicing module for wide-area surveillance","authors":"Marko Heikkilä, M. Pietikäinen","doi":"10.1145/1099396.1099400","DOIUrl":"https://doi.org/10.1145/1099396.1099400","url":null,"abstract":"This paper presents a fully automatic image mosaicing method for needs of wide-area video surveillance. A pure feature-based approach was adopted for finding the registration between the images. This approach provides us with several advantages. Our method is robust against illumination variations, moving objects, image rotation, image scaling, imaging noise, and is relatively fast to calculate. We have tested the performance of the proposed method against several video sequences captured from real-world scenes. The results clearly justify our approach.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125836481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Timeline-based information assimilation in multimedia surveillance and monitoring systems","authors":"P. Atrey, M. Kankanhalli, R. Jain","doi":"10.1145/1099396.1099416","DOIUrl":"https://doi.org/10.1145/1099396.1099416","url":null,"abstract":"Most surveillance and monitoring systems nowadays utilize multiple types of sensors. However, due to the asynchrony among and diversity of sensors, information assimilation - how to combine the information obtained from asynchronous and multifarious sources is an important and challenging research problem. In this paper, we propose a hierarchical probabilistic method for information assimilation in order to detect events of interest in a surveillance and monitoring environment. The proposed method adopts a bottom-up approach and performs assimilation of information at three different levels - media-stream level, atomic-event level and compound-event level.To detect an event, our method uses not only the current media streams but it also utilizes their two important properties - first, accumulated past history of whether they have been providing the concurring or contradictory evidences, and - second, the system designer's confidence in them. A compound event, which comprises of two or more atomic-events, is detected by first estimating probabilistic decisions for the atomic-events based on individual streams, and then by aligning these decisions along a timeline and hierarchically assimilating them. The experimental results show the utility of our method.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121775805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying spatiotemporal object trajectories using unsupervised learning of basis function coefficients","authors":"S. Khalid, A. Naftel","doi":"10.1145/1099396.1099404","DOIUrl":"https://doi.org/10.1145/1099396.1099404","url":null,"abstract":"This paper proposes a novel technique for clustering and classification of object trajectory-based video motion clips using spatiotemporal functional approximations. A Mahalanobis classifier is then used for the detection of anomalous trajectories. Motion trajectories are considered as time series and modeled using the leading Fourier coefficients obtained by a Discrete Fourier Transform. Trajectory clustering is then carried out in the Fourier coefficient feature space to discover patterns of similar object motions. The coefficients of the basis functions are used as input feature vectors to a Self-Organising Map which can learn similarities between object trajectories in an unsupervised manner. Encoding trajectories in this way leads to efficiency gains over existing approaches that use discrete point-based flow vectors to represent the whole trajectory. Experiments are performed on two different datasets -- synthetic and pedestrian object tracking - to demonstrate the effectiveness of our approach. Applications to motion data mining in video surveillance databases are envisaged.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130626017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognizing human actions","authors":"M. Shah","doi":"10.1145/1099396.1099397","DOIUrl":"https://doi.org/10.1145/1099396.1099397","url":null,"abstract":"Recognition of human actions from video sequences is a very active area of research in Computer Vision. An important step in any action recognition approach is the extraction of useful information form a raw video data and its subsequent representation. The representation should account for the variability that arises when arbitrary cameras capture humans performing actions.UCF Computer Vision group has been very active in action recognition area. In this talk, I will present our action recognition work employing a variety of representations: a single point, anatomical landmarks on the human body, and complete contour of the human body. I will also explicitly identify three important sources of variability: (1) viewpoint, (2) execution rate, and (3) anthropometry of actors, and propose a model of human actions that allows us to address all three. Our hypothesis is that the variability associated with the execution of an action can be closely approximated by a linear combination of action bases in joint spatio-temporal space. We demonstrate that such a model bounds the rank of a matrix of image measurements and that this bound can be used to achieve recognition of actions based only on imaged data.","PeriodicalId":196499,"journal":{"name":"Proceedings of the third ACM international workshop on Video surveillance & sensor networks","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133191203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}