2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献_第5页

A Prediction Model for End-of-Utterance Based on Prosodic Features and Phrase-Dependency in Spontaneous Japanese 基于韵律特征和短语依赖的自发性日语语末预测模型

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659535

Y. Ishimoto, Takehiro Teraoka, M. Enomoto

{"title":"A Prediction Model for End-of-Utterance Based on Prosodic Features and Phrase-Dependency in Spontaneous Japanese","authors":"Y. Ishimoto, Takehiro Teraoka, M. Enomoto","doi":"10.23919/APSIPA.2018.8659535","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659535","url":null,"abstract":"This study aims to reveal a clue for predicting end-of-utterance in spontaneous Japanese speech. In casual everyday conversation, participants must predict the ends of utterances of a speaker to perform smooth turn-taking with small gaps or overlaps. Syntactic and prosodic factors are considered to project the end of utterance of speech, and participants utilize these factors to predict the end-of-utterance. In this paper, we focused on the dependency structure among bunsetsu-phrases as a syntactic feature and F0, intensity, and mora duration for bunsetsu-phrases as prosodic features. We investigated the relationship between the position of a bunsetsu-phrase in an utterance and these features. The results showed that a single feature cannot be an authoritative clue that determines the position of bunsetsu-phrases. Next, we constructed a Bayesian hierarchical model to estimate the bunsetsu-phrase position from the syntactic and prosodic features. The results of the model indicated that prosodic features vary in usefulness according to speakers. This suggests that the different combinations of syntactic and prosodic features for each speaker are relevant to predict the ends of utterances.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116277861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low-Frequency Character Clustering for End-to-End ASR System 端到端ASR系统的低频特征聚类

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659735

Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Takeshi S. Kobayakawa, T. Mishima, Shoei Sato, A. Kobayashi

{"title":"Low-Frequency Character Clustering for End-to-End ASR System","authors":"Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Takeshi S. Kobayakawa, T. Mishima, Shoei Sato, A. Kobayashi","doi":"10.23919/APSIPA.2018.8659735","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659735","url":null,"abstract":"We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speech-recognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing the Performance of Halftoning-Based Block Truncation Coding 基于半色调的块截断编码性能优化

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659744

Zi-Xin Xu, Y. Chan, D. Lun

引用次数: 0

Data Hiding in MP4 Video Container based on Subtitle Track 基于字幕轨道的MP4视频容器数据隐藏

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659643

ChuanSheng Chan, Koksheik Wong, Imdad MaungMaung

引用次数: 1

Ensemble Deep Learning Based Cooperative Spectrum Sensing with Stacking Fusion Center 基于集成深度学习的叠加融合中心协同频谱感知

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659774

Hang Liu, Xu Zhu, T. Fujii

引用次数: 9

Multichannel NMF with Reduced Computational Complexity for Speech Recognition 降低计算复杂度的多通道NMF语音识别

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659493

T. Izumi, Takanobu Uramoto, Shingo Uenohara, K. Furuya, Ryo Aihara, Toshiyuki Hanazawa, Y. Okato

引用次数: 0

A Digital Modeling Technique for Distortion Effect Based on a Machine Learning Approach 基于机器学习方法的失真效果数字化建模技术

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659547

Yuto Matsunaga, N. Aoki, Y. Dobashi, Tsuyoshi Yamamoto

引用次数: 2

Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet 基于WaveNet的音符序列演唱F0轮廓序列生成

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659502

Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

{"title":"Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet","authors":"Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii","doi":"10.23919/APSIPA.2018.8659502","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659502","url":null,"abstract":"This paper describes a method that can generate a continuous F0 contour of a singing voice from a monophonic sequence of musical notes (musical score) by using a deep neural autoregressive model called WaveNet. Real F0 contours include complicated temporal and frequency fluctuations caused by singing expressions such as vibrato and portamento. Although explicit models such as hidden Markov models (HMMs) have often used for representing the F0 dynamics, it is difficult to generate realistic F0 contours due to the poor representation capability of such models. To overcome this limitation, WaveNet, which was invented for modeling raw waveforms in an unsupervised manner, was recently used for generating singing F0 contours from a musical score with lyrics in a supervised manner. Inspired by this attempt, we investigate the capability of WaveNet for generating singing F0 contours without using lyric information. Our method conditions WaveNet on pitch and contextual features of a musical score. As a loss function that is more suitable for generating F0 contours, we adopted the modified cross-entropy loss weighted with the square error between target and output F0s on the log-frequency axis. The experimental results show that these techniques improve the quality of generated F0 contours.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130544901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Exploring redundancy of HRTFs for fast training DNN-based HRTF personalization 探索基于dnn的HRTF个性化快速训练的HRTF冗余

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659704

Tzu-Yu Chen, Po-Wen Hsiao, T. Chi

引用次数: 1

Journal Name Extraction from Japanese Scientific News Articles 日本科学新闻文章的期刊名称提取

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659765

M. Kikuchi, Mitsuo Yoshida, Kyoji Umemura

引用次数: 0