2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).最新文献_第3页

Discrete probability density estimation using multirate DSP models 基于多速率DSP模型的离散概率密度估计

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1201722

P. Vaidyanathan, Byung-Jun Yoon

引用次数: 6

A new real-time pattern selection algorithm for very low bit-rate video coding focusing on moving regions 一种针对移动区域的低码率视频编码的实时模式选择算法

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1199495

M. Paul, M. Murshed, L. Dooley

引用次数: 6

In-car speech recognition using distributed microphones-adapting to automatically detected driving conditions 基于分布式麦克风的车载语音识别——适应自动检测的驾驶条件

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1198783

Hideki Banno, Tetsuya Shinde, K. Takeda, F. Itakura

引用次数: 5

On the least squares signal approximation model for overdecimated rational nonuniform filter banks and applications 过抽取有理非均匀滤波器组的最小二乘信号逼近模型及其应用

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1201723

A. Tkacenko, P. Vaidyanathan

引用次数: 3

Audio-visual synchrony for detection of monologues in video archives 视频档案中独白检测的视听同步

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1200085

G. Iyengar, H. Nock, C. Neti

{"title":"Audio-visual synchrony for detection of monologues in video archives","authors":"G. Iyengar, H. Nock, C. Neti","doi":"10.1109/ICASSP.2003.1200085","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200085","url":null,"abstract":"We present our approach to detecting monologues in video shots. A monologue shot is defined as a shot containing a talking person in the video channel with the corresponding speech in the audio channel. Whilst motivated by the TREC 2002 Video Retrieval Track (VT02), the underlying approach of synchrony between audio and video signals is also applicable for voice and face-based biometrics, assessing lip-synchronization quality in movie editing, and for speaker localization in video. Our approach is envisioned as a two part scheme. We first detect the occurrence of speech and face in a video shot. In shots containing both speech and a face, we distinguish monologue shots as those shots where the speech and facial movements are synchronized. To measure the synchrony between speech and facial movements we use a mutual-information based measure. Experiments with the VT02 corpus indicate that using synchrony, the average precision improves by more than 50% relative compared to using face and speech information alone. Our synchrony based monologue detector submission had the best average precision performance (in VT02) amongst 18 different submissions.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114192930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Optimal sampling functions in nonuniform sampling driver designs to overcome the Nyquist limit 克服奈奎斯特极限的非均匀采样驱动设计中的最优采样函数

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1201667

F. Papenfuß, Y. Artyukh, E. Boole, D. Timmermann

{"title":"Optimal sampling functions in nonuniform sampling driver designs to overcome the Nyquist limit","authors":"F. Papenfuß, Y. Artyukh, E. Boole, D. Timmermann","doi":"10.1109/ICASSP.2003.1201667","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201667","url":null,"abstract":"In some applications the observed samples are inherently nonuniform. In contrast to that in this paper we take advantage of deliberate nonuniform sampling and perform DSP where the classical approaches leave off. For instance think about mobile communication or digital radio. Deliberate nonuniform sampling promises increased equivalent sampling rates with reduced overall hardware costs. The equivalent sampling rate is the sampling rate that a uniform sampling device would require in order to achieve the same processing bandwidth. While the equivalent bandwidth of a realizable system may well extend into the GHz range its mean sampling rate is usually in the MHz range. Current existing prototype systems achieve 40 times the bandwidth of a classic DSP system that would operate uniformly (Artyukh et al. (1997)). Throughout the literature on nonuniform sampling (e.g. Bilinskis et al. (1992), Marvasti (2001), and Wojtiuk (2000)) many sampling schemes have been investigated. In this paper the authors discuss a nonuniform sampling scheme that is especially suited to be implemented in digital devices, thus, fully exploiting state-of-the-art ADC without violating their specifications. An analysis of the statistical properties of the algorithm is given to demonstrate common pitfalls and to prove its correctness.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"20 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125198285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Evidence-based object tracking via global energy maximization 基于全局能量最大化的循证目标跟踪

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1199521

J. Carter, P. Lappas, R. Damper

引用次数: 4

Content-adaptive filtering in the UMCTF framework UMCTF框架中的内容自适应过滤

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1199551

D. Turaga, M. Schaar

引用次数: 20

Analysis and reduction of reference frames for motion estimation in MPEG-4 AVC/JVT/H.264 MPEG-4 AVC/JVT/H.264中运动估计参考帧的分析与缩减

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1199128

Yu-Wen Huang, Bing-Yu Hsieh, Tu-Chih Wang, Shao-Yi Chien, Shyh-Yih Ma, Chun-Fu Shen, Liang-Gee Chen

引用次数: 74

Statistical shape theory for activity modeling 活动建模的统计形状理论

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1199519

Namrata Vaswani, A. Roy-Chowdhury, R. Chellappa

{"title":"Statistical shape theory for activity modeling","authors":"Namrata Vaswani, A. Roy-Chowdhury, R. Chellappa","doi":"10.1109/ICASSP.2003.1199519","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199519","url":null,"abstract":"Monitoring activities in a certain region from video data is an important surveillance problem. The goal is to learn the pattern of normal activities and detect unusual ones by identifying activities that deviate appreciably from the typical ones. We propose an approach using statistical shape theory based on the shape model of D.G. Kendall et al. (see \"Shape and Shape Theory\", John Wiley and Sons, 1999). In a low resolution video, each moving object is best represented as a moving point mass or particle. In this case, an activity can be defined by the interactions of all or some of these moving particles over time. We model this configuration of the particles by a polygonal shape formed from the locations of the points in a frame and the activity by the deformation of the polygons in time. These parameters are learned for each typical activity. Given a test video sequence, an activity is classified as abnormal if the probability for the sequence (represented by the mean shape and the dynamics of the deviations), given the model, is below a certain threshold The approach gives very encouraging results in surveillance applications using a single camera and is able to identify various kinds of abnormal behavior.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128245030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9