2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献_第7页

Scrambling-Embedding in Partially-Encrypted Images 在部分加密图像中嵌入乱序

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979991

Koi Yee Ng, Simying Ong

引用次数: 0

Teager Energy Cepstral Coefficients For Classification of Dysarthric Speech Severity-Level 青少年能量倒谱系数在言语困难严重程度分类中的应用

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980322

Aastha Kachhi, Anand Therattil, Ankur T. Patil, Hardik B. Sailor, H. Patil

{"title":"Teager Energy Cepstral Coefficients For Classification of Dysarthric Speech Severity-Level","authors":"Aastha Kachhi, Anand Therattil, Ankur T. Patil, Hardik B. Sailor, H. Patil","doi":"10.23919/APSIPAASC55919.2022.9980322","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980322","url":null,"abstract":"Dysarthria is a neuro-motor speech impairment that renders speech unintelligibility, which is generally imperceptible to humans w.r.t severity-levels. Dysarthric speech classification acts as a diagnostic tool for evaluating the advancement in a patient's severity condition and also aids in automatic dysarthric speech recognition systems (an important assistive speech technology). This study investigates the significance of Teager Energy Cepstral Coefficients (TECC) in dysarthric speech classification using three deep learning architectures, namely, Convolutional Neural Network (CNN), Light-CNN (LCNN), and Residual Networks (ResNet). The performance of TECC is compared with state-of-the-art features, such as Short-Time Fourier Transform (STFT), Mel Frequency Cepstral Coefficients (MFCC), and Linear Frequency Cepstral Coefficients (LFCC). In addition, this study also investigate the effectiveness of cepstral features over the spectral features for this problem. The highest classification accuracy achieved using UA-Speech corpus is 97.18%, 94.63%, and 98.02% (i.e., absolute improvement of 1.98%, 1.41%, and 1.69%) with CNN, LCNN, and ResNet, respectively, as compared to the MFCC. Further, we evaluate feature discriminative capability using $F1$-score, Matthew's Correlation Coefficient (MCC), Jaccard index, and Hamming loss. Finally, analysis of latency period w.r.t. state-of-the-art feature sets indicates the potential of TECC for practical deployment of the severity-level classification system.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-best hypotheses 利用语音、语义信息和n -最优假设的有效ASR纠错

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979951

Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen

{"title":"Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-best hypotheses","authors":"Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen","doi":"10.23919/APSIPAASC55919.2022.9979951","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979951","url":null,"abstract":"Automatic speech recognition (ASR) has recently achieved remarkable success and reached human parity, thanks to the synergistic breakthroughs in neural model architectures and training algorithms. However, the performance of ASR in many real-world use cases is still far from perfect. There has been a surge of research interest in designing and developing feasible post-processing modules to improve recognition performance by refining ASR output sentences, which fall roughly into two categories. The first category of methods is ASR N-best hypothesis reranking. ASR N-best hypothesis reranking aims to find the oracle hypothesis with the lowest word error rate from a given N-best hypothesis list. The other category of methods take inspiration from, for example, Chinese spelling correction (CSC) or English spelling correction (ESC), seeking to detect and correct text-level errors of ASR output sentences. In this paper, we attempt to integrate the above two methods into the ASR error correction (AEC) module and explore the impact of different kinds of features on AEC. Empirical experiments on the widely-used AISHELL-l dataset show that our proposed method can significantly reduce the word error rate (WER) of the baseline ASR transcripts in relation to some top-of-line AEC methods, thereby demonstrating its effectiveness and practical feasibility.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Object Detection in Aerial Images with Attention-based Regression Loss 基于注意力回归损失的航拍图像目标检测

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980311

Chandler Timm C. Doloriel, R. Cajote

引用次数: 0

Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification 基于语音和非语音片段的说话人验证重放攻击检测

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980225

Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee

{"title":"Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification","authors":"Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee","doi":"10.23919/APSIPAASC55919.2022.9980225","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980225","url":null,"abstract":"Voice can represent a person's identity. Thus, it can be used in automatic speaker verification (ASV) systems for authenticating secure applications. Unfortunately, existing ASV systems are vulnerable to spoofing attacks. A replay attack is a widely used spoofing technique because it is simple but difficult to detect. Hence, many methods are proposed for countermeasures against replay attacks. Most work inseparably considers voice and non-voice sections in the detection's performance. In this work, we investigate the spoof detection performances when the voice, non-voice, and both with different percentages of voice are used to obtain the optimal section. We also propose a method for detecting replay attacks using the optimal section of a signal. Mel-frequency cepstral coefficients are calculated from the optimal section as a feature, and the ResNet-34 model is used for classification. We evaluated the proposed method using a dataset from the ASVspoof 2019 challenge. The results depict that the optimal section for replay attack detection is when 10% and 20% of voice are included in the non-voice sections. It also showed that the proposed method outperforms the baselines with a 7.52% relatively improvement or an equal error rate of 1.72%.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130867969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Catastrophic forgetting avoidance method for a Classification Model by Model Synthesis and Introduction of Background Data 基于模型综合和背景数据引入的分类模型避免灾难性遗忘方法

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980154

Hirayama Akari, Kimura Masaomi

引用次数: 0

Simultaneous Frequency Estimation for Three or More Sinusoids Based on Sinusoidal Constraint Differential Equation 基于正弦约束微分方程的三个或多个正弦波同步频率估计

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980228

Kenta Yamada, Yoshiki Masuyama, Yukoh Wakabayashi, Nobutaka Ono

引用次数: 1

Application of Deep Learning-based Single-channel Speech Enhancement for Frequency-modulation Transmitted Speech 基于深度学习的单通道语音增强在调频传输语音中的应用

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980216

Yingyi Ma, Xueliang Zhang

{"title":"Application of Deep Learning-based Single-channel Speech Enhancement for Frequency-modulation Transmitted Speech","authors":"Yingyi Ma, Xueliang Zhang","doi":"10.23919/APSIPAASC55919.2022.9980216","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980216","url":null,"abstract":"There are three main interferences in the FM signal trans-mission process-Multipath effect, Doppler effect, and White noise. These interferences have significant influences on speech. We proposed a method that uses a masking or mapping approach for single-channel speech enhancement in wireless communication. Since the method improves speech equality by focusing on three interferences simultaneously, it is simpler in comparison to conventional methods. Experiments are conducted on the dataset, which is simulated by ourselves. Because the PESQ and STOI need reference targets, it is hard to evaluate the performance using real-world data. So we only give the spectral comparison of the real data enhancement results. Simulation results show excellent speech enhancement performance on the unprocessed mixture and significantly improve speech quality on the actual collected data. It verifies the feasibility of deep learning on this kind of task. Future studies will be made to improve the real-time performance and compress the number of network parameters.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130796944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Restoration of High-Frequency Components in Under Display Camera Images 下显相机图像中高频分量的恢复

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979964

Youngjin Oh, G. Park, N. Cho

{"title":"Restoration of High-Frequency Components in Under Display Camera Images","authors":"Youngjin Oh, G. Park, N. Cho","doi":"10.23919/APSIPAASC55919.2022.9979964","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979964","url":null,"abstract":"Under-Display Camera (UDC) systems have been developed to remove noticeable camera holes or notches and entirely cover the front side with the screen. As the name implies, UDCs are placed under the display, generally organic light-emitting diode (OLED) these days. Since the OLED panel is not transparent and consists of circuits and display devices, the light reaching the camera experiences a loss of photons and a complicated point spread function (PSF). As a result, the obtained images through the UDC system usually experi-ence a color shift, decreased intensity, complex artifacts due to the PSF, and loss/distortion in high-frequency details. To overcome these degradations, we exploit the multi-stage image restoration network and frequency loss function. The network utilizes deformable convolutions to solve the spatially-variant degradations in UDC images based on the fact that the kernel of deformable convolutions is dynamic and adaptive to input. We also apply frequency reconstruction loss when training our models to better restore the lost high-frequency components due to the complicated PSF. We show that our method effectively removes the degradation caused by the UDC system and achieves state-of-the-art performance on a benchmark dataset.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116854253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Approximated ADMM based Algorithm for $ell_{1}-ell_{2}$ Optimization Problem 基于近似ADMM的$ell_{1}-ell_{2}$优化算法

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980002

Rui Lin, Kazunori Hayashi

{"title":"An Approximated ADMM based Algorithm for $ell_{1}-ell_{2}$ Optimization Problem","authors":"Rui Lin, Kazunori Hayashi","doi":"10.23919/APSIPAASC55919.2022.9980002","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980002","url":null,"abstract":"Compressed sensing is a technique to recover a sparse vector from its underdetermined linear measurements. Since a naive $ell_{0}$ optimization approach is hard to tackle due to the discreteness and the non-convexity of $ell_{0}$ norm, a relaxed problem of the $ell_{1}-ell_{2}$ optimization is often employed for the reconstruction of the sparse vector especially when the measurement noise is not negligible. FISTA (fast iterative shrinkage-thresholding algorithm) is one of popular algorithms for the $ell_{1}-ell_{2}$ optimization, and is known to achieve optimal convergence rate among the first order methods. Recently, the employment of optical circuits for various signal processing including deep neural networks has been considered intensively, but it is difficult to implement FISTA with the optical circuit, because it requires operations of divisions with a dynamic value in the algorithm. In this paper, assuming the implementation with the optical circuit, we propose an ADMM (alternating direction method of multipliers) based algorithm for the $ell_{1}-ell_{2}$ optimization. It is true that an ADMM based algorithm for the $ell_{1}-ell_{2}$ optimization has been already proposed in the literature, but the proposed algorithm is derived with the different formulation from the existing method, and unlike the existing ADMM based algorithm, the proposed algorithm does not include the calculation of the inverse of a matrix. Computer simulation results demonstrate that the proposed algorithm can achieve comparable performance as FISTA or existing ADMM based algorithm while requiring no division operations and no matrix inversions.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131068050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0