{"title":"Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network","authors":"Neil Shah, H. Patil, Meet H. Soni","doi":"10.23919/APSIPA.2018.8659692","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659692","url":null,"abstract":"Speech Enhancement (SE) system deals with improving the perceptual quality and preserving the speech intelligibility of the noisy mixture. The Time-Frequency (T-F) masking-based SE using the supervised learning algorithm, such as a Deep Neural Network (DNN), has outperformed the traditional SE techniques. However, the notable difference observed between the oracle mask and the predicted mask, motivates us to explore different deep learning architectures. In this paper, we propose to use a Convolutional Neural Network (CNN)-based Generative Adversarial Network (GAN) for inherent mask estimation. GAN takes an advantage of the adversarial optimization, an alternative to the other Maximum Likelihood (ML) optimization-based architectures. We also show the need for supervised T-F mask estimation for effective noise suppression. Experimental results demonstrate that the proposed T-F mask-based SE significantly outperforms the recently proposed end-to-end SEGAN and a GAN-based Pix2Pix architecture. The performance evaluation in terms of both the predicted mask and the objective measures, dictates the improvement in the speech quality, while simultaneously reducing the speech distortion observed in the noisy mixture.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130724429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Block Tensor Train Decomposition for Missing Value Imputation","authors":"Namgil Lee","doi":"10.23919/APSIPA.2018.8659560","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659560","url":null,"abstract":"We propose a new method for imputation of missing values in large scale matrix data based on a low-rank tensor approximation technique called the block tensor train (TT) decomposition. Given sparsely observed data points, the proposed method iteratively computes the soft-thresholded singular value decomposition (SVD) of the underlying data matrix with missing values. The SVD of matrices is performed based on a low-rank block TT decomposition for large scale data matrices with a low-rank tensor structure. Experimental results on simulated data demonstrate that the proposed method can estimate a large amount of missing values accurately compared to a matrix-based standard method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133519781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Comparative Effect of Snowfall, Accumulation, and Density on Speech Intelligibility","authors":"Shuto Shibata, K. Kondo","doi":"10.23919/APSIPA.2018.8659782","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659782","url":null,"abstract":"Sound is known to be altered in some manner by the acoustic characteristics of snow. However, the specific characteristics of snow, which actually affects the acoustical transfer characteristics, are not clearly understood. This transfer characteristics will be crucial in disaster prevention radio broadcasting systems that warn citizens working outdoors of potential natural disasters during the winter in regions with heavy snow. These systems use extremely high-output horn speakers to convey the warning messages to a large area. Accordingly, the purpose of this research is to clarify how the speech intelligibility will be influenced by the amount of snowfall, its accumulation, and the snow density. In this research, impulse response measurement outdoors is actually carried out during snowfall. We measured and compiled the transfer characteristics under several snow conditions, convolved these with test speech in order to simulate the transmitted speech quality during snow. We conducted a Japanese speech intelligibility test using these speech samples, and clarify the effect of each snow quality measure using multivariate analysis. As a result, it was found that although there is some influence of the amount of snowfall and density, the influence of the amount of snowfall becomes dominant as the distance between the loudspeaker and the listener (microphone) becomes large.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133549242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-View and Multi-Modal Action Recognition with Learned Fusion","authors":"Sandy Ardianto, H. Hang","doi":"10.23919/APSIPA.2018.8659539","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659539","url":null,"abstract":"In this paper, we study multi-modal and multi-view action recognition system based on the deep-learning techniques. We extended the Temporal Segment Network with additional data fusion stage to combine information from different sources. In this research, we use multiple types of information from different modality such as RGB, depth, infrared data to detect predefined human actions. We tested various combinations of these data sources to examine their impact on the final detection accuracy. We designed 3 information fusion methods to generate the final decision. The most interested one is the Learned Fusion Net designed by us. It turns out the Learned Fusion structure has the best results but requires more training.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133302894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oswaldo Bayona, Daniel Ochoa, Ronald Criollo, J. Cevallos-Cevallos, Wenzi Liao
{"title":"Cocoa bean quality assessment using closed range hyperspectral images","authors":"Oswaldo Bayona, Daniel Ochoa, Ronald Criollo, J. Cevallos-Cevallos, Wenzi Liao","doi":"10.23919/APSIPA.2018.8659490","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659490","url":null,"abstract":"Farmers mix high and low quality cocoa beans to increase their income at the expense of chocolate flavor. We use closed range hyperspectral images to recognize two common varieties of cocoa beans at various fermentation stages. Several image calibration issues are addressed in this paper to reduce the effect of the bean's shape in the reflectance image estimation and specular patches on the bean's surface. Fusion and feature extraction techniques were exploited for bean classification. From our experimental results, we noticed that bean's biochemical processes during fermentation of each bean type influences their spectral signatures enabling an increasingly better discrimination. We found that spectral indexes related to anthocyanin reflectance index yield a high discriminant rate, particularly at later fermentation stages. These findings suggest that bean classification is possible and could be adopted as the standard method for fast bean quality assessment.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132088500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Block-Permutation-Based Image Encryption Allowing Hierarchical Decryption","authors":"Yusuke Izawa, Shoko Imaizumi, H. Kiya","doi":"10.23919/APSIPA.2018.8659479","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659479","url":null,"abstract":"This paper proposes a block-permutation-based encryption (BPBE) scheme, which allows only decrypting particular regions in the encrypted image. It is difficult to perform partial decryption in the conventional scheme, because it encrypts the entire image at once. By composing regions in the original image, we can conduct the hierarchical encryption and achieve the partial decryption in the proposed scheme. Additionally, the proposed scheme can maintain the JPEG-LS compression efficiency of the encrypted images compared to the conventional scheme. Moreover, the resilience against jigsaw puzzle solving problems can be enhanced by applying the proposed scheme to the combined images. We further consider an efficient key management by using hash chains.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127869386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sungmin Lee, Sara Akbarzadeh, Satnam Singh, Chin-Tuan Tan
{"title":"A Speech Processing Strategy based on Sinusoidal Speech Model for Cochlear Implant Users","authors":"Sungmin Lee, Sara Akbarzadeh, Satnam Singh, Chin-Tuan Tan","doi":"10.23919/APSIPA.2018.8659620","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659620","url":null,"abstract":"In sinusoidal modeling(SM), speech signal, which is pseudo-periodic in structure, can be approximated by sinusoids and noise without losing significant speech information. A speech processing strategy based on this sinusoidal speech model will be relevant for encoding electric pulse streams in cochlear implant (CI) processing, where the number of channels available is limited. In this study, 5 normal hearing(NH) listeners and 2 CI users were asked to perform the task of speech recognition and perceived sound quality rating on speech sentences processed in 12 different test conditions. The sinusoidal analysis/synthesis algorithm was limited to 1, 3 or 6 sinusoids from the sentences low-pass filtered at either 1 kHz, 1.5 kHz, 3 kHz, or 6 kHz, re-synthesized as the test conditions. Each of 12 lists of AzBio sentences was randomly chosen and process with one of 12 test conditions, before they were presented to each participant at 65 dB SPL (Sound Pressure Level). Participant was instructed to repeat the sentence as they perceived, and the number of words correctly recognized was scored. They were also asked to rate the perceived sound quality of the sentences including original speech sentence, on the scale of 1 (distorted) to 10 (clean). Both speech recognition score and perceived sound quality rating across all participants increase when the number of sinusoids increases and low-pass filter broadens. Our current finding showed that three sinusoids may be sufficient to elicit the nearly maximum speech intelligibility and quality necessary for both NH and CI listeners. Sinusoidal speech model has the potential in facilitating the basis for a speech processing strategy in CI.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127386133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taiki Matsuyoshi, Tatsuya Komatsu, Reishi Kondo, Takeshi Yamada, S. Makino
{"title":"Weakly Labeled Learning Using BLSTM-CTC for Sound Event Detection","authors":"Taiki Matsuyoshi, Tatsuya Komatsu, Reishi Kondo, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPA.2018.8659528","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659528","url":null,"abstract":"In this paper, we propose a method of weakly labeled learning of bidirectional long short-term memory (BLSTM) using connectionist temporal classification (BLSTM-CTC) to reduce the hand-labeling cost of learning samples. BLSTM-CTC enables us to update the parameters of BLSTM by loss calculation using CTC, instead of the exact error calculation that cannot be conducted when using weakly labeled samples, which have only the event class of each individual sound event. In the proposed method, we first conduct strongly labeled learning of BLSTM using a small amount of strongly labeled samples, which have the timestamps of the beginning and end of each individual sound event and its event class, as initial learning. We then conduct weakly labeled learning based on BLSTM-CTC using a large amount of weakly labeled samples as additional learning. To evaluate the performance of the proposed method, we conducted a sound event detection experiment using the dataset provided by Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 2. As a result, the proposed method improved the segment-based F1 score by 1.9% compared with the initial learning mentioned above. Furthermore, it succeeded in reducing the labeling cost by 95%, although the F1 score was degraded by 1.3%, comparing with additional learning using a large amount of strongly labeled samples. This result confirms that our weakly labeled learning is effective for learning BLSTM with a low hand-labeling cost.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133795308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on Indoor Dimming Method Utilizing Outside Light for Power Saving","authors":"Kengo Sasaki, E. Okamoto","doi":"10.23919/APSIPA.2018.8659602","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659602","url":null,"abstract":"In the next generation power networks, more energy saving and energy-efficient network are required. One of the solutions is a location-aware energy distribution scheme, where persons' location is accurately estimated by a centimeter-order indoor localization scheme and the energy is preferentially allocated to the electric equipment near the persons. As one of its applications, there is an energy-saving indoor lighting control scheme exploiting person's location information and the estimated illumination intensity, and large energy saving effects are obtained. We have proposed an indoor diming scheme that considers an external light in previous studies. However, in the previous study, advanced intensity measurements at many reference points were required. Therefore, in this paper, we propose an energy-saving indoor lighting control method that uses an estimated external light to reduce the measurement points. Numerical results show the advanced performance of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"323 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124295125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Diversification Strategy for IIR Filter Design Using PSO","authors":"Y. Takase, K. Suyama","doi":"10.23919/APSIPA.2018.8659771","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659771","url":null,"abstract":"IIR (Infinite Impulse Response) filter design problem is a non-linear optimization problem. Because PSO (Particle Swarm Optimization) can enumerate solution candidates quickly, it is known as an effective method for such a problem. However, PSO has a drawback that tends to indicate a premature convergence due to a strong directivity. In this paper, PSS (Problem Space Stretch)-PSO is verified to avoid the local minimum stagnation. Several design examples are shown to present the effectiveness of the method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}