2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).最新文献_第7页

Color image segmentation using density-based clustering 基于密度聚类的彩色图像分割

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1199480

Qixiang Ye, Wen Gao, Wei Zeng

引用次数: 30

3D motion vector coding with block base adaptive interpolation filter on H.264 基于H.264的块基自适应插值滤波器的三维运动矢量编码

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1199554

H. Kimata, Masaki Kitahara, Y. Yashima

引用次数: 3

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition 基于帧的多流音频语音识别可靠性指标

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1198707

A. Garg, G. Potamianos, C. Neti, Thomas S. Huang

{"title":"Frame-dependent multi-stream reliability indicators for audio-visual speech recognition","authors":"A. Garg, G. Potamianos, C. Neti, Thomas S. Huang","doi":"10.1109/ICASSP.2003.1198707","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198707","url":null,"abstract":"We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speech-class conditional observation probabilities of appropriate audio-or visual-only classifiers. We subsequently map the four reliability indicators into the stream exponents of a state-synchronous, two-stream hidden Markov model, as a sigmoid function of their linear combination. We propose two algorithms to estimate the sigmoid weights, based on the maximum conditional likelihood and minimum classification error criteria. We demonstrate the superiority of the proposed approach on a connected-digit audio-visual speech recognition task, under varying audio channel noise conditions. Indeed, the use of the estimated, frame-dependent stream exponents results in a significantly smaller word error rate than using global stream exponents. In addition, it outperforms utterance-level exponents, even though the latter utilize a-priori knowledge of the utterance noise level.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116750239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

A fusion scheme of visual and auditory modalities for event detection in sports video 一种用于运动视频事件检测的视觉和听觉模态融合方案

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-07-06 DOI: 10.1109/ICASSP.2003.1199139

Mingliang Xu, Ling-yu Duan, Changsheng Xu, Q. Tian

{"title":"A fusion scheme of visual and auditory modalities for event detection in sports video","authors":"Mingliang Xu, Ling-yu Duan, Changsheng Xu, Q. Tian","doi":"10.1109/ICASSP.2003.1199139","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199139","url":null,"abstract":"We propose an effective fusion scheme of visual and auditory modalities to detect events in sports video. The proposed scheme is built upon semantic shot classification, where we classify video shots into several major or interesting classes, each of which has clear semantic meanings. Among major shot classes we perform classification of the different auditory signal segments (i.e. silence, hitting ball, applause, commentator speech) with the goal of detecting events with strong semantic meaning. For instance, for tennis video, we have identified five interesting events: serve, reserve, ace, return, and score. Since we have developed a unified framework for semantic shot classification in sports videos and a set of audio mid-level representation with supervised learning methods, the proposed fusion scheme can be easily adapted to a new sports game. We are extending this fusion scheme to three additional typical sports videos: basketball, volleyball and soccer. Correctly detected sports video events will greatly facilitate further structural and temporal analysis, such as sports video skimming, table of content, etc.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"120 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133192791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 71

Tone feature extraction through parametric modeling and analysis-by-synthesis-based pattern matching 通过参数化建模和基于合成分析的模式匹配提取音调特征

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-05-21 DOI: 10.1109/ICASSP.2003.1198719

Jinfu Ni, H. Kawai

引用次数: 5

Successive bit-plane rate allocation technique for JPEG2000 image coding JPEG2000图像编码的连续位平面速率分配技术

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-05-21 DOI: 10.1109/ICASSP.2003.1199157

Y. M. Yeung, O. Au, A. Chang

引用次数: 7

DBN based multi-stream models for speech 基于DBN的语音多流模型

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-04-06 DOI: 10.1109/ICASSP.2003.1198911

Yimin Zhang, Q. Diao, Shan Huang, Wei Hu, C. Bartels, J. Bilmes

引用次数: 40

A small sample model selection criterion based on Kullback's symmetric divergence 基于Kullback对称散度的小样本模型选择准则

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-04-06 DOI: 10.1109/ICASSP.2003.1201639

A. Seghouane, M. Bekara, G. Fleury

引用次数: 5

Co-channel speaker identification using usable speech extraction based on multi-pitch tracking 基于多音高跟踪的可用语音提取的同信道说话人识别

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-04-06 DOI: 10.1109/ICASSP.2003.1202330

Yang Shao, Deliang Wang

引用次数: 52

Blind (training-like) decoder assisted beamforming for DS-CDMA systems DS-CDMA系统盲(类训练)解码器辅助波束形成

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Pub Date : 2003-04-06 DOI: 10.1109/ICASSP.2003.1202672

R. Pacheco, D. Hatzinakos

引用次数: 1