2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

Skipped-Hierarchical Feature Pyramid Networks for Nuclei Instance Segmentation 核实例分割的跳过层次特征金字塔网络

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659795

Hyekyoung Hwang, T. Bui, Sang-il Ahn, Jitae Shin

引用次数: 1

A Signal Separation Method for Physical Wireless Parameter Conversion Sensor Networks Using K-Shortest Path 基于k -最短路径的物理无线参数转换传感器网络信号分离方法

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659631

Shuhei Yamasaki, Minato Oriuchi, O. Takyu, K. Shirai, T. Fujii, M. Ohta, F. Sasamori, S. Handa

{"title":"A Signal Separation Method for Physical Wireless Parameter Conversion Sensor Networks Using K-Shortest Path","authors":"Shuhei Yamasaki, Minato Oriuchi, O. Takyu, K. Shirai, T. Fujii, M. Ohta, F. Sasamori, S. Handa","doi":"10.23919/APSIPA.2018.8659631","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659631","url":null,"abstract":"Addressing low delay and high traffic performance is a technique necessary for wireless sensor networks (WSN). Although physical wireless parameter conversion sensor networks (PhyC-SN) achieve simultaneous information gathering from multiple sensors, separating the gathered mixed sensing results becomes a difficult problem. The proposed method utilizes an approach used in multi target tracking (MTT) in order to separate the mixed data points into a set of sequential ones. Particularly, we regard the data separation problem as path planning problems. In short, we consider paths by connecting data points observed at the adjacent time, and find a set of continuous paths consisting of data points of the same sensor. Following the problem, the same number of paths as sensors are obtained, so all sensing results can be correctly discriminated and labeled over all times in WSN. Therefore, we focus on a $k$-shortest pass method of MTT. In this paper, we show the accuracy of signal separation through simulation experiments and evaluate it in terms of the precision rate quantitatively.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115495603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discriminative Feature Extraction Based on Sequential Variational Autoencoder for Speaker Recognition 基于顺序变分自编码器的判别性特征提取在说话人识别中的应用

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659722

Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda

引用次数: 1

Implication of speech level control in noise to sound quality judgement 噪声中语音电平控制对音质判断的意义

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659672

Sara Akbarzadeh, Sungmin Lee, Satnam Singh, Chin-Tuan Tan

{"title":"Implication of speech level control in noise to sound quality judgement","authors":"Sara Akbarzadeh, Sungmin Lee, Satnam Singh, Chin-Tuan Tan","doi":"10.23919/APSIPA.2018.8659672","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659672","url":null,"abstract":"Relative levels of speech and noise, which is signal-to-noise ratio (SNR), alone as a metric may not fully account how human perceives speech in noise or making judgement on the sound quality of the speech component. To date, the most common rationale in front-end processing of noisy speech in assistive hearing devices is to reduce “noise” (estimated) with a sole objective to improve the overall SNR. Absolute sound pressure level of speech in the remaining noise, which is necessary for listeners to anchor their perceptual judgement, is assumed to be restored by the subsequent dynamic range compression stage intended to compensate for the loudness recruitment in hearing impaired (HI). However, un-coordinated setting of thresholds that trigger the nonlinear processing in these two separate stages, amplify the remaining “noise” and/or distortion instead. This will confuse listener's judgement of sound quality and deviate from the usual perceptual trend as one would expect when more noise was present. In this study, both normal hearing (NH) and HI listeners were asked to rate the sound quality of noisy speech and noise reduced speech as they perceived. The result found that speech processed by noise reduction algorithms were lower in quality compared to original unprocessed speech in noise conditions. The outcomes also showed that sound quality judgement was dependent on both input SNR and absolute level of speech, with a greater weightage on the latter, across both NH and HI listeners. The outcome of this study potentially suggests that integrating the two separate processing stages into one will better match with the underlying mechanism in auditory reception of sound. Further work will attempt to identify settings of these two processing stages for a better speech reception in assistive hearing device users.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127308205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Probabilistic Sequential Patterns for Singing Transcription 歌唱转录的概率顺序模式

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659637

Eita Nakamura, Ryo Nishikimi, S. Dixon, Kazuyoshi Yoshii

{"title":"Probabilistic Sequential Patterns for Singing Transcription","authors":"Eita Nakamura, Ryo Nishikimi, S. Dixon, Kazuyoshi Yoshii","doi":"10.23919/APSIPA.2018.8659637","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659637","url":null,"abstract":"Statistical models of musical scores play an important role in various tasks of music information processing. It has been an open problem to construct a score model incorporating global repetitive structure of note sequences, which is expected to be useful for music transcription and other tasks. Since repetitions can be described by a sparse distribution over note patterns (segments of music), a possible solution is to consider a Bayesian score model in which such a sparse distribution is first generated for each individual piece and then musical notes are generated in units of note patterns according to the distribution. However, straightforward construction is impractical due to the enormous number of possible note patterns. We propose a probabilistic model that represents a cluster of note patterns, instead of explicitly dealing with the set of all possible note patterns, to attain computational tractability. A score model is constructed as a mixture or a Markov model of such clusters, which is compatible with the above framework for describing repetitive structure. As a practical test to evaluate the potential of the model, we consider the problem of singing transcription from vocal f0 trajectories. Evaluation results show that our model achieves better predictive ability and transcription accuracies compared to the conventional Markov model, nearly reaching state-of-the-art performance.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126085145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Estimation of glottal source waveforms and vocal tract shape for singing voices with wide frequency range 宽频域歌唱声门源波形及声道形状的估计

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659480

K. Takahashi, M. Akagi

{"title":"Estimation of glottal source waveforms and vocal tract shape for singing voices with wide frequency range","authors":"K. Takahashi, M. Akagi","doi":"10.23919/APSIPA.2018.8659480","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659480","url":null,"abstract":"Estimation of glottal vibration and vocal tract for singing voices is necessary for clarifying the mechanism of singing voice production. However, accurate estimation of glottal vibration and vocal tract shape in singing voices with a high fundamental frequency (f0) is difficult using simulated models such as the auto-regressive with exogenous input (ARX) model and LiljencrantsFant (LF) model. This is caused by two problems: the inaccurate estimation method of the glottal closure instant (GCI) and the inappropriate estimation method of ARX model parameter values in singing voices with high f0. Therefore, this proposed method aims to accurately estimate glottal source waveforms and vocal tract shape for singing voices with wide frequency range. To achieve this objective, we propose two solutions: estimation of GCI using an electroglottogram (EGG) signal and estimation of ARX model parameter values using multi-stage optimization and an evaluation function including the leaking effect from forwarded periods. In experiments using simulated singing voices and real singing voices, it was indicated that the accurate estimation of GCI, the reliable estimation of the parameter values of the ARX model for singing voices with high f0, and the estimation of glottal vibration and vocal tract shape in singing voices with wide frequency range were achieved by the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126116038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Chatting Application Monitoring on Android System and its Detection based on the Correlation Test 基于相关测试的Android聊天应用监控及检测

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659583

Yafei Li, Jiageng Chen, A. Ho

{"title":"Chatting Application Monitoring on Android System and its Detection based on the Correlation Test","authors":"Yafei Li, Jiageng Chen, A. Ho","doi":"10.23919/APSIPA.2018.8659583","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659583","url":null,"abstract":"Mobile phones are playing an important roles in our modern digital society, which have already replaced the traditional computer in many situations. Nevertheless, the number of malicious software also starts to grow and showed significant impact on our legal use. Among several mobile systems, the Android platform is currently the most widely used and open system, which also makes it a very attractive target for the malicious applications. User privacy is of great interest to many different agents, which becomes of the most valuable target for the malware, and the chatting software naturally become one of the richest information resource target. In this paper, we first investigate the core techniques that are used by the most monitoring softwares. Then we propose several correlation experiments to efficiently detect the those softwares. We developed a monitoring prototype as well as the detecting system, including the mobile phone side and the remote web server side, to simulate the scenario in the real-world environment. The experiment confirmed the efficiency of our approach.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116136326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Survey on Replay Attack Detection for Automatic Speaker Verification (ASV) System 自动说话人验证(ASV)系统重放攻击检测研究综述

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659666

H. Patil, Madhu R. Kamble

{"title":"A Survey on Replay Attack Detection for Automatic Speaker Verification (ASV) System","authors":"H. Patil, Madhu R. Kamble","doi":"10.23919/APSIPA.2018.8659666","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659666","url":null,"abstract":"In this paper, we present a brief survey of various approaches used to detect replay attack for Automatic Speaker Verification (ASV) system. The replay spoofing attack is the most challenging task to detect as only few seconds of audio samples are required to replay genuine speaker's voice. Due to large availability and the widespread usage of the mobile/smart gadgets, recording devices, it is easy and simple to record and replay the genuine speaker's voice. The challenging task, in replay spoof attack is to detect the acoustical characteristics of the speech signal between the natural and replayed version. The speech signal recorded with the playback device contains the convolutional and additive distortions from the intermediate device. Background noise and channel degradations seriously constrain the performance of the system. The goal of this paper is to provide an overview of the replay attack focusing on 2nd ASVspoof 2017 challenge which is an emerging research problem in the field of anti-spoofing. This paper presents critical analysis of state-of-the-art techniques, various countermeasures, databases, and also aims to present current limitations along with road map ahead, i.e., future research directions in this technological challenging problem.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122532496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Nonlinear Online Learning — A Kernel SMF Approach 非线性在线学习-核SMF方法

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659670

Kewei Chen, Stefan Werner, A. Kuh, Yih-Fang Huang

引用次数: 2

SILK Steganography Scheme Based on the Distribution of LSF Parameter 基于LSF参数分布的SILK隐写方案

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI: 10.23919/APSIPA.2018.8659509

Yanzhen Ren, Weiman Zheng, Lina Wang

{"title":"SILK Steganography Scheme Based on the Distribution of LSF Parameter","authors":"Yanzhen Ren, Weiman Zheng, Lina Wang","doi":"10.23919/APSIPA.2018.8659509","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659509","url":null,"abstract":"SILK, as a speech codec for real-time packet-based voice communications, which is widely used in many popular mobile Internet application, such as Skype, WeChat, QQ, WhatsApp, etc. It will be a novel and ideal carrier for information hiding. In this paper, a secure steganography scheme for SILK is proposed, which embeds secret message by modifying the LSF (Line Spectral Frequency) quantization indices based on the statistical distribution of LSF Codebook. The experimental results show that the auditory concealment of the proposed scheme is excellent, the decrease in PESQ is very small. The average hiding capacity can achieve 129 bps and 223 bps under the sampling rate of 8 kHz and 16 kHz respectively. More importantly, the proposed scheme has good statistical security. In this scheme, the statistical distribution of LSF Codebook is considered as a constraint condition to make the distribution of stego's codeword close to that of the cover audio. Under the steganlysis scheme which is referenced from the existing steganlysis scheme for G.723.1, the average correct detection rate is under 55.4% for both cover and stego audio. To the best of our knowledge, this is the first work to hide information in SILK. Based on the similar principle of speech compression, the method can be extended to other CELP codec, such as G.723.1, G.729, AMR, etc.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121868719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5