2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).最新文献

筛选
英文 中文
Discrete probability density estimation using multirate DSP models 基于多速率DSP模型的离散概率密度估计
P. Vaidyanathan, Byung-Jun Yoon
{"title":"Discrete probability density estimation using multirate DSP models","authors":"P. Vaidyanathan, Byung-Jun Yoon","doi":"10.1109/ICASSP.2003.1201722","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201722","url":null,"abstract":"We propose a model based approach for estimation of probability mass functions for discrete random variables. The model is based on tools from multirate signal processing. Similar in principle to the kernel based methods, the approach takes advantage of well-known results from multirate signal processing theory. Similarities to and differences from wavelet based approaches are also indicated where appropriate. In the final form, the probability estimates are obtained by filtering the square root of the histogram through a multirate system whose components are biorthogonal partners of each other.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"236 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120939480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A new real-time pattern selection algorithm for very low bit-rate video coding focusing on moving regions 一种针对移动区域的低码率视频编码的实时模式选择算法
M. Paul, M. Murshed, L. Dooley
{"title":"A new real-time pattern selection algorithm for very low bit-rate video coding focusing on moving regions","authors":"M. Paul, M. Murshed, L. Dooley","doi":"10.1109/ICASSP.2003.1199495","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199495","url":null,"abstract":"Very low bit-rate video coding, using regular shaped patterns to focus on moving regions in macroblocks, has gained significant attention recently. This paper presents a new real-time pattern selection (RTPS) algorithm using a large codebook of thirty two patterns. The algorithm uses a relevance measurement for all the patterns and a moving region, to eliminate a large number of irrelevant patterns prior to the actual best likelihood pattern selection procedure. Both theoretically and empirically it is proven that not only is the computational complexity of the new algorithm comparable to the contemporary algorithm that use a pattern codebook size of only eight patterns but also the new algorithm reduces the bit-rate significantly, while maintaining comparable subjective quality.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126643335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
In-car speech recognition using distributed microphones-adapting to automatically detected driving conditions 基于分布式麦克风的车载语音识别——适应自动检测的驾驶条件
Hideki Banno, Tetsuya Shinde, K. Takeda, F. Itakura
{"title":"In-car speech recognition using distributed microphones-adapting to automatically detected driving conditions","authors":"Hideki Banno, Tetsuya Shinde, K. Takeda, F. Itakura","doi":"10.1109/ICASSP.2003.1198783","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198783","url":null,"abstract":"In this paper, we describe a multichannel method of noisy speech recognition that can adapt to various in-car noise situations during driving. The method allows us to estimate the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by multiple distributed microphones. Through clustering of the spatial noise distributions under various driving conditions, the regression weights for MRLS are effectively adapted to the driving conditions. The experimental evaluation shows an average error rate reduction of 43 % in isolated word recognition under 15 different driving conditions.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134050786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the least squares signal approximation model for overdecimated rational nonuniform filter banks and applications 过抽取有理非均匀滤波器组的最小二乘信号逼近模型及其应用
A. Tkacenko, P. Vaidyanathan
{"title":"On the least squares signal approximation model for overdecimated rational nonuniform filter banks and applications","authors":"A. Tkacenko, P. Vaidyanathan","doi":"10.1109/ICASSP.2003.1201723","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201723","url":null,"abstract":"With the advent of wavelets for lossy data compression came the notion of representing signals in a certain vector space by their projections in well chosen subspaces of the original space. In this paper, we consider the subspace of signals generated by an overdecimated rational nonuniform filter bank and find the optimal conditions under which the mean-squared error between a given deterministic signal and its representation in this subspace is minimized for a fixed set of synthesis filters. Under these optimal conditions, it is shown that choosing the synthesis filters to further minimize this error is simply an energy compaction problem. With this, we introduce the notion of deterministic energy compaction filters for classes of signals. Simulation results are presented showing the merit of our proposed method for optimizing the synthesis filters.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121434407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Audio-visual synchrony for detection of monologues in video archives 视频档案中独白检测的视听同步
G. Iyengar, H. Nock, C. Neti
{"title":"Audio-visual synchrony for detection of monologues in video archives","authors":"G. Iyengar, H. Nock, C. Neti","doi":"10.1109/ICASSP.2003.1200085","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200085","url":null,"abstract":"We present our approach to detecting monologues in video shots. A monologue shot is defined as a shot containing a talking person in the video channel with the corresponding speech in the audio channel. Whilst motivated by the TREC 2002 Video Retrieval Track (VT02), the underlying approach of synchrony between audio and video signals is also applicable for voice and face-based biometrics, assessing lip-synchronization quality in movie editing, and for speaker localization in video. Our approach is envisioned as a two part scheme. We first detect the occurrence of speech and face in a video shot. In shots containing both speech and a face, we distinguish monologue shots as those shots where the speech and facial movements are synchronized. To measure the synchrony between speech and facial movements we use a mutual-information based measure. Experiments with the VT02 corpus indicate that using synchrony, the average precision improves by more than 50% relative compared to using face and speech information alone. Our synchrony based monologue detector submission had the best average precision performance (in VT02) amongst 18 different submissions.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114192930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Optimal sampling functions in nonuniform sampling driver designs to overcome the Nyquist limit 克服奈奎斯特极限的非均匀采样驱动设计中的最优采样函数
F. Papenfuß, Y. Artyukh, E. Boole, D. Timmermann
{"title":"Optimal sampling functions in nonuniform sampling driver designs to overcome the Nyquist limit","authors":"F. Papenfuß, Y. Artyukh, E. Boole, D. Timmermann","doi":"10.1109/ICASSP.2003.1201667","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201667","url":null,"abstract":"In some applications the observed samples are inherently nonuniform. In contrast to that in this paper we take advantage of deliberate nonuniform sampling and perform DSP where the classical approaches leave off. For instance think about mobile communication or digital radio. Deliberate nonuniform sampling promises increased equivalent sampling rates with reduced overall hardware costs. The equivalent sampling rate is the sampling rate that a uniform sampling device would require in order to achieve the same processing bandwidth. While the equivalent bandwidth of a realizable system may well extend into the GHz range its mean sampling rate is usually in the MHz range. Current existing prototype systems achieve 40 times the bandwidth of a classic DSP system that would operate uniformly (Artyukh et al. (1997)). Throughout the literature on nonuniform sampling (e.g. Bilinskis et al. (1992), Marvasti (2001), and Wojtiuk (2000)) many sampling schemes have been investigated. In this paper the authors discuss a nonuniform sampling scheme that is especially suited to be implemented in digital devices, thus, fully exploiting state-of-the-art ADC without violating their specifications. An analysis of the statistical properties of the algorithm is given to demonstrate common pitfalls and to prove its correctness.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"20 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125198285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Evidence-based object tracking via global energy maximization 基于全局能量最大化的循证目标跟踪
J. Carter, P. Lappas, R. Damper
{"title":"Evidence-based object tracking via global energy maximization","authors":"J. Carter, P. Lappas, R. Damper","doi":"10.1109/ICASSP.2003.1199521","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199521","url":null,"abstract":"This paper describes a robust algorithm for arbitrary object tracking in long image sequences. This technique extends the dynamic Hough transform proposed in our earlier work to detect arbitrary shapes undergoing affine motion. The proposed tracking algorithm processes the whole image sequence globally. First, the object boundary is represented in lookup-table form, and we then perform an operation that estimates the energy of the motion trajectory in the parameter space. We assign an extra term in our cost function to incorporate smoothness of deformation. The object is actually rigid, so by 'deformation' we mean changes due to rotation or scaling of the object. There is no need for training or initialization, and an efficient implementation can be achieved with coarse-to-fine dynamic programming and pruning. The method, because of its evidence-based nature, is robust under noise and occlusion.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114981664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Content-adaptive filtering in the UMCTF framework UMCTF框架中的内容自适应过滤
D. Turaga, M. Schaar
{"title":"Content-adaptive filtering in the UMCTF framework","authors":"D. Turaga, M. Schaar","doi":"10.1109/ICASSP.2003.1199551","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199551","url":null,"abstract":"Unconstrained motion compensated temporal filtering (UMCTF) is a very general and flexible framework for temporal filtering. It allows the selection of many different filters as well as decomposition structures to allow easy adaptation to video content, bandwidth variations, complexity requirements, and in conjunction with embedded coding can provide spatio-temporal-SNR scalability. In this paper we demonstrate the content-adaptive filter selection provided within the UMCTF framework. We show improvements in coding efficiency as well as in decoded visual quality using content-adaptive filters, at different granularities.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133058104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Analysis and reduction of reference frames for motion estimation in MPEG-4 AVC/JVT/H.264 MPEG-4 AVC/JVT/H.264中运动估计参考帧的分析与缩减
Yu-Wen Huang, Bing-Yu Hsieh, Tu-Chih Wang, Shao-Yi Chien, Shyh-Yih Ma, Chun-Fu Shen, Liang-Gee Chen
{"title":"Analysis and reduction of reference frames for motion estimation in MPEG-4 AVC/JVT/H.264","authors":"Yu-Wen Huang, Bing-Yu Hsieh, Tu-Chih Wang, Shao-Yi Chien, Shyh-Yih Ma, Chun-Fu Shen, Liang-Gee Chen","doi":"10.1109/ICASSP.2003.1199128","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199128","url":null,"abstract":"In the new video coding standard, MPEG-4 AVC/JVT/H.264, motion estimation is allowed to use multiple reference frames. The reference software adopts a full search scheme, and the increased computation is in proportion to the number of searched reference frames. However, the reduction of prediction residues is highly dependent on the nature of the sequences, not on the number of searched frames. We present a method to speed up the matching process for multiple reference frames. For each macroblock, we analyze the available information after intra prediction and motion estimation from the previous frame to determine whether it is necessary to search more frames. The information we use includes selected mode, inter prediction residues, intra prediction residues, and motion vectors. Simulation results show that the proposed algorithm can save up to 90% of unnecessary frames while keeping the average miss rate of optimal frames less than 4%.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114334997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Statistical shape theory for activity modeling 活动建模的统计形状理论
Namrata Vaswani, A. Roy-Chowdhury, R. Chellappa
{"title":"Statistical shape theory for activity modeling","authors":"Namrata Vaswani, A. Roy-Chowdhury, R. Chellappa","doi":"10.1109/ICASSP.2003.1199519","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199519","url":null,"abstract":"Monitoring activities in a certain region from video data is an important surveillance problem. The goal is to learn the pattern of normal activities and detect unusual ones by identifying activities that deviate appreciably from the typical ones. We propose an approach using statistical shape theory based on the shape model of D.G. Kendall et al. (see \"Shape and Shape Theory\", John Wiley and Sons, 1999). In a low resolution video, each moving object is best represented as a moving point mass or particle. In this case, an activity can be defined by the interactions of all or some of these moving particles over time. We model this configuration of the particles by a polygonal shape formed from the locations of the points in a frame and the activity by the deformation of the polygons in time. These parameters are learned for each typical activity. Given a test video sequence, an activity is classified as abnormal if the probability for the sequence (represented by the mean shape and the dynamics of the deviations), given the model, is below a certain threshold The approach gives very encouraging results in surveillance applications using a single camera and is able to identify various kinds of abnormal behavior.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128245030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信