{"title":"Real-time video surveillance for traffic monitoring using virtual line analysis","authors":"Belle L. Tseng, Ching-Yung Lin, John R. Smith","doi":"10.1109/ICME.2002.1035671","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035671","url":null,"abstract":"A real-time video surveillance is presented for traffic monitoring of vehicle volume on major highways. Determining traffic volume automatically and in real-time assists drivers to dynamically plan their trips more efficiently. Our traffic monitoring system uses the virtual line graph to facilitate the detection of vehicles, classification of vehicle types, tracking of individual vehicles, and subsequently an accurate count of the number of vehicles. The virtual line analyzer detects vehicles as they cross a virtual boundary. The goal of this traffic monitoring system is to provide a real-time and accurate vehicle counter while taking advantage of stationary Web-cams, fixed highways and lanes, and deterministic vehicle characteristics.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"183 1","pages":"541-544 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75678916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Loss concealments of subband coded images for real-time transmissions in the Internet","authors":"B. Wah, Xiao Su","doi":"10.1109/ICME.2002.1035637","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035637","url":null,"abstract":"Subband-coded images can be transmitted in the Internet using either the TCP or the UDP protocol. Due to the fact that TCP employs congestion control and retransmissions and that UDP and TCP packets are treated differently by routers, TCP has much longer delays than those of UDP to deliver an image, but packet losses in UDP may lead to poor decoding quality if the image is single-description coded (SDC) and the losses cannot be concealed. We study the use of UDP to deliver multi-description coded (MDC) reconstruction-based subband-transformed (ST) images and the reconstruction of missing information at a receiver based on information received. To facilitate recovery from UDP packet losses, we propose a joint sender-receiver approach for designing optimized reconstruction-based subband transform (ORB-ST) in MDC. We then carefully evaluate the delay-quality trade-offs between the TCP delivery of SDC images and the UDP and combined TCP/UDP delivery of MDC images. Experimental results show that our proposed ORB-ST performs well in real Internet tests, and UDP and combined TCP/UDP delivery of MDC images provide a range of attractive alternatives to TCP delivery.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"22 1","pages":"449-452 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73673058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic topic identification in multimedia broadcast data","authors":"S. Werner, U. Iurgel, A. Kosmala, G. Rigoll","doi":"10.1109/ICME.2002.1035713","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035713","url":null,"abstract":"This paper presents a system that automatically scans multimedia data like TV or radio broadcasts for the presence of specific topics and, whenever topics of users' interests are detected, alerts the related user. Our current work on the three main modules of the system is shown. (1) The speech recognition system (with 18.7 % WER) is already among the most advanced German broadcast speech recognition systems. (2) The innovative topic identification approach, which is especially designed to work on the output of a speech recognizer, is compared to a standard text based approach. (3) The topic segmentation module has a good performance detecting not only scene cuts or speaker turns, but also real topic boundaries.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"2 1","pages":"41-44 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74043795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Fujiwara, H. Koshimizu, K. Fujimura, G. Fujita, Y. Noguchi, N. Ishikawa
{"title":"A method for 3D face modeling and caricatured figure generation","authors":"T. Fujiwara, H. Koshimizu, K. Fujimura, G. Fujita, Y. Noguchi, N. Ishikawa","doi":"10.1109/ICME.2002.1035531","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035531","url":null,"abstract":"This paper proposes a method for modeling a subject with a 3D face from 2D facial images captured from surrounding 2D cameras which synchronously measure the color, texture and surface shape information of the face. The 3D facial caricaturing method uses a 3D (polygon data) face model. An automatic method for extracting regions of the facial parts is technically proposed, and the feature points are extracted from those regions. We propose a wire frame model composed of 44 feature points and 82 polygons to cover a head. To generate a caricature from the polygon data, the individuality feature is defined by the difference of the feature points between the input face and the mean face, which was defined from the average of many input faces. We also propose a method for producing the 3D figure of a human facial caricature.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"22 1","pages":"137-140 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74235846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast marching methods applied to face location in videophone applications using colour information","authors":"P. Sharma, R. Reilly","doi":"10.1109/ICME.2002.1035532","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035532","url":null,"abstract":"A new method is proposed to automatically segment out a person's face from a given sequence of images that consists of a head-and-shoulder view, using the fast marching level set approach. The method proposed involves a fast, reliable and computationally efficient algorithm, which exploits the colour information in an image to segment the face region. The colour information is derived and used to classify pixels in the image into different levels depending on the value of the colour information for that pixel. Once this is achieved, a priority-labelled fast marching level set algorithm is utilised to locate the face region. The performance of the fast marching face-segmentation algorithm is illustrated by some simulation results carried out on head-and-shoulder test images.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"59 1","pages":"141-144 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75900001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video portal for a media space of structured video streams","authors":"T. Ogura, N. Babaguchi, T. Kitahashi","doi":"10.1109/ICME.2002.1035588","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035588","url":null,"abstract":"The necessity for a video portal which supports an access to a specific video or its part from a huge video media space consisting of a large number of structured video streams is increasing. The video portal for a media space can be defined as a system which helps us to access videos through various kinds of media segments such as image, text, audio and video segments. In order to construct the video portal, we deal with structured video with metadata. Our system defines the structure of video in terms of document type definition (DTD), and describes the metadata of video with XML according to the DTD. We propose the video portal for structured video streams for the sports domain, and verify its usefulness based on a prototype system's performance.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"46 1","pages":"309-312 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77884963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An optimal interpolation-based scheme for video summarization","authors":"N. Doulamis, A. Doulamis, K. Ntalianis","doi":"10.1109/ICME.2002.1035777","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035777","url":null,"abstract":"In this paper, an optimal and efficient algorithm for video summarization is proposed by exploiting temporal variations of video visual content. In particular, the most characteristic frames/shots (key-frames/shots) are extracted by estimating appropriate points on the feature vector curve, which represent in an optimal way the corresponding trajectory. This is performed by minimizing the approximation error of the feature vector curve and the respective curve formed by the estimated points using an interpolation scheme. A genetic algorithm is used for the minimization task, since the complexity of an exhaustive search is too large to be implemented. Furthermore, a fast technique for increasing the number of extracted key-frames/shots is presented.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"297-300 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84920298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An evaluation of sound source identification with RWCP sound scene database in real acoustic environments","authors":"T. Nishiura, Satoshi Nakamura","doi":"10.1109/ICME.2002.1035570","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035570","url":null,"abstract":"It is very important for a hands-free speech interface to capture distant speech with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization methods in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have difficulty localizing the target talker among known multiple sound source positions. To cope with these problems, we propose a new talker localization method consisting of two algorithms. One algorithm is for multiple sound source localization based on CSP (cross-power spectrum phase) analysis. The other algorithm is for sound source identification among localized multiple sound sources towards talker localization. We particularly focus on the latter statistical sound source identification among localized multiple sound sources with statistical speech and environmental sound models based on GMMs (Gaussian mixture models) and a microphone array towards talker localization. We especially evaluate the performance of the proposed algorithms with the RWCP sound scene database in real acoustic environments (RWCP-DB).","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"55 1","pages":"265-268 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85229162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wavelet Kalman based reconstruction","authors":"A. David","doi":"10.1109/ICME.2002.1035881","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035881","url":null,"abstract":"The analysis and synthesis operations of commonly used families of the discrete wavelet transform (DWT) are modeled within the framework of an ordinary Kalman filter (KF). In particular, the synthesis operation is regarded as the state evolution from a coarse subspace to a finer one. The analysis operation is considered as a partially observable process. It is shown that such descriptions provide for a natural and compact representation of the underlying signals. Extensions to image reconstruction and compression applications are demonstrated.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"4 6 1","pages":"713-716 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85737603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seong-Geun Kwon, Suk-Hwan Lee, Kee-Koo Kwon, Ki-Ryong Kwon, Kuhn-Il Lee
{"title":"Watermark detection algorithm using statistical decision theory","authors":"Seong-Geun Kwon, Suk-Hwan Lee, Kee-Koo Kwon, Ki-Ryong Kwon, Kuhn-Il Lee","doi":"10.1109/ICME.2002.1035843","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035843","url":null,"abstract":"Watermark detection plays a crucial role in multimedia copyright protection and has traditionally been tackled using correlation-based algorithms. However, correlation-based detection is not actually the best choice, as it does not utilize the distributional characteristics of the image being marked. Accordingly, an efficient watermark detection scheme for DWT coefficients is proposed as optimal for non-additive schemes. Based on the statistical decision theory, the proposed method is derived according to Bayes' decision theory, the Neyman-Pearson criterion, and the distribution of the DWT coefficients, thereby minimizing the missed detection probability subject to a given false alarm probability. The proposed method has been tested in the context of robustness, and the results confirm the superiority of the proposed technique over conventional correlation-based detection methods.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"56 1","pages":"561-564 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85977777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}