2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献_第4页

Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network 基于时频掩模的卷积生成对抗网络语音增强

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659692

Neil Shah, H. Patil, Meet H. Soni

{"title":"Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network","authors":"Neil Shah, H. Patil, Meet H. Soni","doi":"10.23919/APSIPA.2018.8659692","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659692","url":null,"abstract":"Speech Enhancement (SE) system deals with improving the perceptual quality and preserving the speech intelligibility of the noisy mixture. The Time-Frequency (T-F) masking-based SE using the supervised learning algorithm, such as a Deep Neural Network (DNN), has outperformed the traditional SE techniques. However, the notable difference observed between the oracle mask and the predicted mask, motivates us to explore different deep learning architectures. In this paper, we propose to use a Convolutional Neural Network (CNN)-based Generative Adversarial Network (GAN) for inherent mask estimation. GAN takes an advantage of the adversarial optimization, an alternative to the other Maximum Likelihood (ML) optimization-based architectures. We also show the need for supervised T-F mask estimation for effective noise suppression. Experimental results demonstrate that the proposed T-F mask-based SE significantly outperforms the recently proposed end-to-end SEGAN and a GAN-based Pix2Pix architecture. The performance evaluation in terms of both the predicted mask and the objective measures, dictates the improvement in the speech quality, while simultaneously reducing the speech distortion observed in the noisy mixture.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130724429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Block Tensor Train Decomposition for Missing Value Imputation 缺失值输入的块张量列分解

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659560

Namgil Lee

引用次数: 0

On the Comparative Effect of Snowfall, Accumulation, and Density on Speech Intelligibility 降雪量、累积量和密度对语音清晰度的比较效应

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659782

Shuto Shibata, K. Kondo

{"title":"On the Comparative Effect of Snowfall, Accumulation, and Density on Speech Intelligibility","authors":"Shuto Shibata, K. Kondo","doi":"10.23919/APSIPA.2018.8659782","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659782","url":null,"abstract":"Sound is known to be altered in some manner by the acoustic characteristics of snow. However, the specific characteristics of snow, which actually affects the acoustical transfer characteristics, are not clearly understood. This transfer characteristics will be crucial in disaster prevention radio broadcasting systems that warn citizens working outdoors of potential natural disasters during the winter in regions with heavy snow. These systems use extremely high-output horn speakers to convey the warning messages to a large area. Accordingly, the purpose of this research is to clarify how the speech intelligibility will be influenced by the amount of snowfall, its accumulation, and the snow density. In this research, impulse response measurement outdoors is actually carried out during snowfall. We measured and compiled the transfer characteristics under several snow conditions, convolved these with test speech in order to simulate the transmitted speech quality during snow. We conducted a Japanese speech intelligibility test using these speech samples, and clarify the effect of each snow quality measure using multivariate analysis. As a result, it was found that although there is some influence of the amount of snowfall and density, the influence of the amount of snowfall becomes dominant as the distance between the loudspeaker and the listener (microphone) becomes large.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133549242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-View and Multi-Modal Action Recognition with Learned Fusion 基于学习融合的多视角多模态动作识别

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659539

Sandy Ardianto, H. Hang

引用次数: 9

Cocoa bean quality assessment using closed range hyperspectral images 用近距离高光谱图像评价可可豆质量

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659490

Oswaldo Bayona, Daniel Ochoa, Ronald Criollo, J. Cevallos-Cevallos, Wenzi Liao

引用次数: 2

A Block-Permutation-Based Image Encryption Allowing Hierarchical Decryption 允许分层解密的基于块排列的图像加密

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659479

Yusuke Izawa, Shoko Imaizumi, H. Kiya

引用次数: 0

A Speech Processing Strategy based on Sinusoidal Speech Model for Cochlear Implant Users 基于正弦语音模型的人工耳蜗用户语音处理策略

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659620

Sungmin Lee, Sara Akbarzadeh, Satnam Singh, Chin-Tuan Tan

{"title":"A Speech Processing Strategy based on Sinusoidal Speech Model for Cochlear Implant Users","authors":"Sungmin Lee, Sara Akbarzadeh, Satnam Singh, Chin-Tuan Tan","doi":"10.23919/APSIPA.2018.8659620","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659620","url":null,"abstract":"In sinusoidal modeling(SM), speech signal, which is pseudo-periodic in structure, can be approximated by sinusoids and noise without losing significant speech information. A speech processing strategy based on this sinusoidal speech model will be relevant for encoding electric pulse streams in cochlear implant (CI) processing, where the number of channels available is limited. In this study, 5 normal hearing(NH) listeners and 2 CI users were asked to perform the task of speech recognition and perceived sound quality rating on speech sentences processed in 12 different test conditions. The sinusoidal analysis/synthesis algorithm was limited to 1, 3 or 6 sinusoids from the sentences low-pass filtered at either 1 kHz, 1.5 kHz, 3 kHz, or 6 kHz, re-synthesized as the test conditions. Each of 12 lists of AzBio sentences was randomly chosen and process with one of 12 test conditions, before they were presented to each participant at 65 dB SPL (Sound Pressure Level). Participant was instructed to repeat the sentence as they perceived, and the number of words correctly recognized was scored. They were also asked to rate the perceived sound quality of the sentences including original speech sentence, on the scale of 1 (distorted) to 10 (clean). Both speech recognition score and perceived sound quality rating across all participants increase when the number of sinusoids increases and low-pass filter broadens. Our current finding showed that three sinusoids may be sufficient to elicit the nearly maximum speech intelligibility and quality necessary for both NH and CI listeners. Sinusoidal speech model has the potential in facilitating the basis for a speech processing strategy in CI.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127386133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weakly Labeled Learning Using BLSTM-CTC for Sound Event Detection 基于BLSTM-CTC的弱标记学习用于声音事件检测

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659528

Taiki Matsuyoshi, Tatsuya Komatsu, Reishi Kondo, Takeshi Yamada, S. Makino

{"title":"Weakly Labeled Learning Using BLSTM-CTC for Sound Event Detection","authors":"Taiki Matsuyoshi, Tatsuya Komatsu, Reishi Kondo, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPA.2018.8659528","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659528","url":null,"abstract":"In this paper, we propose a method of weakly labeled learning of bidirectional long short-term memory (BLSTM) using connectionist temporal classification (BLSTM-CTC) to reduce the hand-labeling cost of learning samples. BLSTM-CTC enables us to update the parameters of BLSTM by loss calculation using CTC, instead of the exact error calculation that cannot be conducted when using weakly labeled samples, which have only the event class of each individual sound event. In the proposed method, we first conduct strongly labeled learning of BLSTM using a small amount of strongly labeled samples, which have the timestamps of the beginning and end of each individual sound event and its event class, as initial learning. We then conduct weakly labeled learning based on BLSTM-CTC using a large amount of weakly labeled samples as additional learning. To evaluate the performance of the proposed method, we conducted a sound event detection experiment using the dataset provided by Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 2. As a result, the proposed method improved the segment-based F1 score by 1.9% compared with the initial learning mentioned above. Furthermore, it succeeded in reducing the labeling cost by 95%, although the F1 score was degraded by 1.3%, comparing with additional learning using a large amount of strongly labeled samples. This result confirms that our weakly labeled learning is effective for learning BLSTM with a low hand-labeling cost.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133795308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Study on Indoor Dimming Method Utilizing Outside Light for Power Saving 利用外界光节能的室内调光方法研究

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659602

Kengo Sasaki, E. Okamoto

引用次数: 0

A Diversification Strategy for IIR Filter Design Using PSO 基于粒子群算法的IIR滤波器多样化设计策略

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659771

Y. Takase, K. Suyama

引用次数: 2