2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
A Prediction Model for End-of-Utterance Based on Prosodic Features and Phrase-Dependency in Spontaneous Japanese 基于韵律特征和短语依赖的自发性日语语末预测模型
Y. Ishimoto, Takehiro Teraoka, M. Enomoto
{"title":"A Prediction Model for End-of-Utterance Based on Prosodic Features and Phrase-Dependency in Spontaneous Japanese","authors":"Y. Ishimoto, Takehiro Teraoka, M. Enomoto","doi":"10.23919/APSIPA.2018.8659535","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659535","url":null,"abstract":"This study aims to reveal a clue for predicting end-of-utterance in spontaneous Japanese speech. In casual everyday conversation, participants must predict the ends of utterances of a speaker to perform smooth turn-taking with small gaps or overlaps. Syntactic and prosodic factors are considered to project the end of utterance of speech, and participants utilize these factors to predict the end-of-utterance. In this paper, we focused on the dependency structure among bunsetsu-phrases as a syntactic feature and F0, intensity, and mora duration for bunsetsu-phrases as prosodic features. We investigated the relationship between the position of a bunsetsu-phrase in an utterance and these features. The results showed that a single feature cannot be an authoritative clue that determines the position of bunsetsu-phrases. Next, we constructed a Bayesian hierarchical model to estimate the bunsetsu-phrase position from the syntactic and prosodic features. The results of the model indicated that prosodic features vary in usefulness according to speakers. This suggests that the different combinations of syntactic and prosodic features for each speaker are relevant to predict the ends of utterances.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116277861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Frequency Character Clustering for End-to-End ASR System 端到端ASR系统的低频特征聚类
Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Takeshi S. Kobayakawa, T. Mishima, Shoei Sato, A. Kobayashi
{"title":"Low-Frequency Character Clustering for End-to-End ASR System","authors":"Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Takeshi S. Kobayakawa, T. Mishima, Shoei Sato, A. Kobayashi","doi":"10.23919/APSIPA.2018.8659735","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659735","url":null,"abstract":"We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speech-recognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing the Performance of Halftoning-Based Block Truncation Coding 基于半色调的块截断编码性能优化
Zi-Xin Xu, Y. Chan, D. Lun
{"title":"Optimizing the Performance of Halftoning-Based Block Truncation Coding","authors":"Zi-Xin Xu, Y. Chan, D. Lun","doi":"10.23919/APSIPA.2018.8659744","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659744","url":null,"abstract":"Block Truncation Coding (BTC) is an effective lossy image coding technique that enjoys both high efficiency and low complexity especially when halftoning techniques are employed to shape the noise spectrum of its output. However, due to its block-based nature, blocking artifacts are commonly found in the coding outputs. Post-processing schemes are generally applied to soften the problem. Recently, a halftoning-based BTC algorithm was proposed to solve this problem by eliminating the cause of blocking artifacts. In this paper, through an optimization step, the performance of the algorithm is optimized in terms of a given objective measure. The idea can be adopted to work with other halftoning methods to optimize other measures for suiting different needs in different circumstances.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124052371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Hiding in MP4 Video Container based on Subtitle Track 基于字幕轨道的MP4视频容器数据隐藏
ChuanSheng Chan, Koksheik Wong, Imdad MaungMaung
{"title":"Data Hiding in MP4 Video Container based on Subtitle Track","authors":"ChuanSheng Chan, Koksheik Wong, Imdad MaungMaung","doi":"10.23919/APSIPA.2018.8659643","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659643","url":null,"abstract":"This paper proposes a data hiding method in MP4 container format. Specifically, the synchronization between subtitle and audio-video tracks is exploited to hide data. The time scale is first scaled, and the sample duration pair is modified to hide data. The proposed method is able to hide data reversibly when the payload size is relative small, and it switches to the irreversible mode to offer higher payload. Although synchronization between audio-video and subtitle tracks are manipulated, the delay or ahead in displaying subtitle is imperceptible. The filesize of the processed MP4 file is also completely preserved. Subjective evaluations are carried out to verify the basic performance of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125779782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ensemble Deep Learning Based Cooperative Spectrum Sensing with Stacking Fusion Center 基于集成深度学习的叠加融合中心协同频谱感知
Hang Liu, Xu Zhu, T. Fujii
{"title":"Ensemble Deep Learning Based Cooperative Spectrum Sensing with Stacking Fusion Center","authors":"Hang Liu, Xu Zhu, T. Fujii","doi":"10.23919/APSIPA.2018.8659774","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659774","url":null,"abstract":"In this paper, an ensemble learning (EL) framework is adopted for cooperative spectrum sensing (CSS) in an orthogonal frequency division multiplexing (OFDM) signal based cognitive radio system. Each secondary user (SU) is accordingly considered as a base learner, where the local spectrum sensing is for investigating the probability of PU being inactive or active. The convolution neural networks with simple architecture are applied given its strength in image recognition as well as the limited computation ability of each SU, meanwhile, the cyclic spectral correlation feature is introduced as the input data. Here, as for the supervised learning, the bagging strategy is helped to establish the training database. For the global decision, the fusion center employs the stacked generalization for further combination learning the SU output of classification pre-prediction of the PU status. Our method shows significant advantages over conventional CSS methods in term of the detection probability or false alarm probability performance.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Multichannel NMF with Reduced Computational Complexity for Speech Recognition 降低计算复杂度的多通道NMF语音识别
T. Izumi, Takanobu Uramoto, Shingo Uenohara, K. Furuya, Ryo Aihara, Toshiyuki Hanazawa, Y. Okato
{"title":"Multichannel NMF with Reduced Computational Complexity for Speech Recognition","authors":"T. Izumi, Takanobu Uramoto, Shingo Uenohara, K. Furuya, Ryo Aihara, Toshiyuki Hanazawa, Y. Okato","doi":"10.23919/APSIPA.2018.8659493","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659493","url":null,"abstract":"In this study, we propose efficient the number of computational iteration method of MNMF for speech recognition. The proposed method initializes and estimates the MNMF algorithm with respect to the estimated spatial correlation matrix reducing the number of iteration of update algorithm. This time, mask emphasis via Expectation Maximization algorithm is used for estimation of a spatial correlation matrix. As another method, we propose a computational complexity reduction method via decimating update of the spatial correlation matrixH. The experimental result indicates that our method reduced the computational complexity of MNMF. It shows that the performance of the conventional MNMF was maintained and the computational complexity could be reduced.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128243373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Digital Modeling Technique for Distortion Effect Based on a Machine Learning Approach 基于机器学习方法的失真效果数字化建模技术
Yuto Matsunaga, N. Aoki, Y. Dobashi, Tsuyoshi Yamamoto
{"title":"A Digital Modeling Technique for Distortion Effect Based on a Machine Learning Approach","authors":"Yuto Matsunaga, N. Aoki, Y. Dobashi, Tsuyoshi Yamamoto","doi":"10.23919/APSIPA.2018.8659547","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659547","url":null,"abstract":"This paper describes an experimental result of modeling stomp boxes of the distortion effect based on a machine learning approach. Our proposed technique models the distortion stomp boxes as a neural network consisting of CNN and LSTM. In this approach, CNN is employed for modeling the linear component that appears in the pre and post filters of the stomp boxes. On the other hand, LSTM is employed for modeling the nonlinear component that appears in the distortion process of the stomp boxes. All the parameters are estimated through the training process using the input and output signals of the distortion stomp boxes. The experimental result indicates that the proposed technique may have a certain potential to replicate the distortion stomp boxes appropriately by using the well-trained neural network.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128697250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet 基于WaveNet的音符序列演唱F0轮廓序列生成
Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii
{"title":"Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet","authors":"Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii","doi":"10.23919/APSIPA.2018.8659502","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659502","url":null,"abstract":"This paper describes a method that can generate a continuous F0 contour of a singing voice from a monophonic sequence of musical notes (musical score) by using a deep neural autoregressive model called WaveNet. Real F0 contours include complicated temporal and frequency fluctuations caused by singing expressions such as vibrato and portamento. Although explicit models such as hidden Markov models (HMMs) have often used for representing the F0 dynamics, it is difficult to generate realistic F0 contours due to the poor representation capability of such models. To overcome this limitation, WaveNet, which was invented for modeling raw waveforms in an unsupervised manner, was recently used for generating singing F0 contours from a musical score with lyrics in a supervised manner. Inspired by this attempt, we investigate the capability of WaveNet for generating singing F0 contours without using lyric information. Our method conditions WaveNet on pitch and contextual features of a musical score. As a loss function that is more suitable for generating F0 contours, we adopted the modified cross-entropy loss weighted with the square error between target and output F0s on the log-frequency axis. The experimental results show that these techniques improve the quality of generated F0 contours.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130544901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploring redundancy of HRTFs for fast training DNN-based HRTF personalization 探索基于dnn的HRTF个性化快速训练的HRTF冗余
Tzu-Yu Chen, Po-Wen Hsiao, T. Chi
{"title":"Exploring redundancy of HRTFs for fast training DNN-based HRTF personalization","authors":"Tzu-Yu Chen, Po-Wen Hsiao, T. Chi","doi":"10.23919/APSIPA.2018.8659704","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659704","url":null,"abstract":"A deep neural network (DNN) is constructed to predict the magnitude responses of the head-related transfer functions (HRTFs) of users for a specific direction and a specific ear. Using the CIPIC HRTF database (including 25 azimuth angles and 50 elevation angles for both ears), we trained 2500 DNNs to predict magnitude responses of all HRTFs of a user. To reduce training time, we propose to use the final weights of the trained DNN of a nearby direction as the initial weights of the current DNN under training since magnitude responses of the HRTFs are smoothly changing across nearby directions. Analysis of variance (ANOVA) was performed to show that the proposed training scheme produces equivalent magnitude responses of HRTFs as the standard training scheme with random initial weights in terms of the log-spectral distortion (LSD) measure. Meanwhile, the proposed training scheme can dramatically reduce training time by more than 95%.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127887837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Journal Name Extraction from Japanese Scientific News Articles 日本科学新闻文章的期刊名称提取
M. Kikuchi, Mitsuo Yoshida, Kyoji Umemura
{"title":"Journal Name Extraction from Japanese Scientific News Articles","authors":"M. Kikuchi, Mitsuo Yoshida, Kyoji Umemura","doi":"10.23919/APSIPA.2018.8659765","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659765","url":null,"abstract":"In Japanese scientific news articles, although the research results are described clearly, the article's sources tend to be uncited. This makes it difficult for readers to know the details of the research. In this paper, we address the task of extracting journal names from Japanese scientific news articles. We hypothesize that a journal name is likely to occur in a specific context. To support the hypothesis, we construct a character-based method and extract journal names using this method. This method only uses the left and right context features of journal names. The results of the journal name extractions suggest that the distribution hypothesis plays an important role in identifying the journal names.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131352225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信