2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Scrambling-Embedding in Partially-Encrypted Images 在部分加密图像中嵌入乱序
Koi Yee Ng, Simying Ong
{"title":"Scrambling-Embedding in Partially-Encrypted Images","authors":"Koi Yee Ng, Simying Ong","doi":"10.23919/APSIPAASC55919.2022.9979991","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979991","url":null,"abstract":"In this paper, an improved scrambling-embedding technique, namely row-rotational-based data hiding method is proposed to hide data in partially-encrypted images. The partially-encrypted images are generated by performing bit-wise XOR-cipher to investigate the feasibility of applying the proposed method in various encryption levels. The proposed method is performed by divided each row into multiple non-overlapping continuous partitions. These partitions will be arranged in a rotational manner to create different states, while each state will be used to represent specific data in binary representation. During the decoding process, α notation is introduced to reduce the number of failure rows, which will cause further image degradation and incorrect data extraction. The BSDS300 dataset is utilized for experiments, and encrypted with different encryption strengths. From the experiment results, it is observed that when least significant bits are encrypted, the proposed data hiding method using scrambling-embedding technique can still performed well as in the plain image domain.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129375033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Teager Energy Cepstral Coefficients For Classification of Dysarthric Speech Severity-Level 青少年能量倒谱系数在言语困难严重程度分类中的应用
Aastha Kachhi, Anand Therattil, Ankur T. Patil, Hardik B. Sailor, H. Patil
{"title":"Teager Energy Cepstral Coefficients For Classification of Dysarthric Speech Severity-Level","authors":"Aastha Kachhi, Anand Therattil, Ankur T. Patil, Hardik B. Sailor, H. Patil","doi":"10.23919/APSIPAASC55919.2022.9980322","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980322","url":null,"abstract":"Dysarthria is a neuro-motor speech impairment that renders speech unintelligibility, which is generally imperceptible to humans w.r.t severity-levels. Dysarthric speech classification acts as a diagnostic tool for evaluating the advancement in a patient's severity condition and also aids in automatic dysarthric speech recognition systems (an important assistive speech technology). This study investigates the significance of Teager Energy Cepstral Coefficients (TECC) in dysarthric speech classification using three deep learning architectures, namely, Convolutional Neural Network (CNN), Light-CNN (LCNN), and Residual Networks (ResNet). The performance of TECC is compared with state-of-the-art features, such as Short-Time Fourier Transform (STFT), Mel Frequency Cepstral Coefficients (MFCC), and Linear Frequency Cepstral Coefficients (LFCC). In addition, this study also investigate the effectiveness of cepstral features over the spectral features for this problem. The highest classification accuracy achieved using UA-Speech corpus is 97.18%, 94.63%, and 98.02% (i.e., absolute improvement of 1.98%, 1.41%, and 1.69%) with CNN, LCNN, and ResNet, respectively, as compared to the MFCC. Further, we evaluate feature discriminative capability using $F1$-score, Matthew's Correlation Coefficient (MCC), Jaccard index, and Hamming loss. Finally, analysis of latency period w.r.t. state-of-the-art feature sets indicates the potential of TECC for practical deployment of the severity-level classification system.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-best hypotheses 利用语音、语义信息和n -最优假设的有效ASR纠错
Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen
{"title":"Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-best hypotheses","authors":"Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen","doi":"10.23919/APSIPAASC55919.2022.9979951","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979951","url":null,"abstract":"Automatic speech recognition (ASR) has recently achieved remarkable success and reached human parity, thanks to the synergistic breakthroughs in neural model architectures and training algorithms. However, the performance of ASR in many real-world use cases is still far from perfect. There has been a surge of research interest in designing and developing feasible post-processing modules to improve recognition performance by refining ASR output sentences, which fall roughly into two categories. The first category of methods is ASR N-best hypothesis reranking. ASR N-best hypothesis reranking aims to find the oracle hypothesis with the lowest word error rate from a given N-best hypothesis list. The other category of methods take inspiration from, for example, Chinese spelling correction (CSC) or English spelling correction (ESC), seeking to detect and correct text-level errors of ASR output sentences. In this paper, we attempt to integrate the above two methods into the ASR error correction (AEC) module and explore the impact of different kinds of features on AEC. Empirical experiments on the widely-used AISHELL-l dataset show that our proposed method can significantly reduce the word error rate (WER) of the baseline ASR transcripts in relation to some top-of-line AEC methods, thereby demonstrating its effectiveness and practical feasibility.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Object Detection in Aerial Images with Attention-based Regression Loss 基于注意力回归损失的航拍图像目标检测
Chandler Timm C. Doloriel, R. Cajote
{"title":"Object Detection in Aerial Images with Attention-based Regression Loss","authors":"Chandler Timm C. Doloriel, R. Cajote","doi":"10.23919/APSIPAASC55919.2022.9980311","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980311","url":null,"abstract":"Object detection is a computer vision technique used to identify objects that are usually present in natural scenes. However, the methods used for this case are not easily transferable to detect objects in aerial images. Objects in aerial images are mostly arbitrary-oriented, small, and in complex backgrounds compared to upright and well-focused objects in natural scenes. To effectively detect objects in aerial images, we propose a new regression loss function based on the attention mechanism through attention weights. Using the relative position of the attention weights to the bounding box, the foreground is given more attention, which highlights the target object and effectively suppresses the noise and background. Preliminary experiments are conducted on an attention-based object detector using the DOTA dataset to test the capability of attention mechanism in extracting the contextual information of objects, especially in complex environments.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130834451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification 基于语音和非语音片段的说话人验证重放攻击检测
Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee
{"title":"Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification","authors":"Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee","doi":"10.23919/APSIPAASC55919.2022.9980225","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980225","url":null,"abstract":"Voice can represent a person's identity. Thus, it can be used in automatic speaker verification (ASV) systems for authenticating secure applications. Unfortunately, existing ASV systems are vulnerable to spoofing attacks. A replay attack is a widely used spoofing technique because it is simple but difficult to detect. Hence, many methods are proposed for countermeasures against replay attacks. Most work inseparably considers voice and non-voice sections in the detection's performance. In this work, we investigate the spoof detection performances when the voice, non-voice, and both with different percentages of voice are used to obtain the optimal section. We also propose a method for detecting replay attacks using the optimal section of a signal. Mel-frequency cepstral coefficients are calculated from the optimal section as a feature, and the ResNet-34 model is used for classification. We evaluated the proposed method using a dataset from the ASVspoof 2019 challenge. The results depict that the optimal section for replay attack detection is when 10% and 20% of voice are included in the non-voice sections. It also showed that the proposed method outperforms the baselines with a 7.52% relatively improvement or an equal error rate of 1.72%.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130867969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Catastrophic forgetting avoidance method for a Classification Model by Model Synthesis and Introduction of Background Data 基于模型综合和背景数据引入的分类模型避免灾难性遗忘方法
Hirayama Akari, Kimura Masaomi
{"title":"Catastrophic forgetting avoidance method for a Classification Model by Model Synthesis and Introduction of Background Data","authors":"Hirayama Akari, Kimura Masaomi","doi":"10.23919/APSIPAASC55919.2022.9980154","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980154","url":null,"abstract":"Animals including humans, continuously acquire knowledge and skills throughout their lives. However, many machine learning models cannot learn new tasks without forgetting past knowledge. In neural networks, it is common to use one neural network for each training task, and successive training will reduce the accuracy of the previous task. This problem is called catastrophic forgetting, and research on continual learning is being conducted to solve it. In this paper, we proposed a method to reducing catastrophic forgetting, where new tasks are trained without retaining previously trained data. Our method assumes that tasks are classification. Our method adds random data to the training data in order to combine models trained on different tasks to avoid exceed generalization in the domain where train data do not exist combines models separately trained for each tasks. In the evaluation experiments, we confirmed that our method reduced forgetting for the original two-dimensional dataset and MNIST dataset.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128757045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous Frequency Estimation for Three or More Sinusoids Based on Sinusoidal Constraint Differential Equation 基于正弦约束微分方程的三个或多个正弦波同步频率估计
Kenta Yamada, Yoshiki Masuyama, Yukoh Wakabayashi, Nobutaka Ono
{"title":"Simultaneous Frequency Estimation for Three or More Sinusoids Based on Sinusoidal Constraint Differential Equation","authors":"Kenta Yamada, Yoshiki Masuyama, Yukoh Wakabayashi, Nobutaka Ono","doi":"10.23919/APSIPAASC55919.2022.9980228","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980228","url":null,"abstract":"In this paper, we present a short-time frequency estimation method that can handle multiple sinusoids simultaneously. Frequency estimation is a fundamental problem in audio analysis. For realizing high-temporal resolution, an approach based on a differential equation of a sinusoid, which is referred to as the sinusoidal constraint differential equation (SCDE), has been proposed. The SCDE-based method can efficiently and accurately estimate frequency even from a short-term signal. However, in terms of simultaneous estimation, up to two sinusoids have been considered so far. In this paper, we extend this approach to three or more sinusoids. Our experimental results show that our method outperformed existing methods based on the discrete Fourier transform.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129084214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Application of Deep Learning-based Single-channel Speech Enhancement for Frequency-modulation Transmitted Speech 基于深度学习的单通道语音增强在调频传输语音中的应用
Yingyi Ma, Xueliang Zhang
{"title":"Application of Deep Learning-based Single-channel Speech Enhancement for Frequency-modulation Transmitted Speech","authors":"Yingyi Ma, Xueliang Zhang","doi":"10.23919/APSIPAASC55919.2022.9980216","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980216","url":null,"abstract":"There are three main interferences in the FM signal trans-mission process-Multipath effect, Doppler effect, and White noise. These interferences have significant influences on speech. We proposed a method that uses a masking or mapping approach for single-channel speech enhancement in wireless communication. Since the method improves speech equality by focusing on three interferences simultaneously, it is simpler in comparison to conventional methods. Experiments are conducted on the dataset, which is simulated by ourselves. Because the PESQ and STOI need reference targets, it is hard to evaluate the performance using real-world data. So we only give the spectral comparison of the real data enhancement results. Simulation results show excellent speech enhancement performance on the unprocessed mixture and significantly improve speech quality on the actual collected data. It verifies the feasibility of deep learning on this kind of task. Future studies will be made to improve the real-time performance and compress the number of network parameters.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130796944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Restoration of High-Frequency Components in Under Display Camera Images 下显相机图像中高频分量的恢复
Youngjin Oh, G. Park, N. Cho
{"title":"Restoration of High-Frequency Components in Under Display Camera Images","authors":"Youngjin Oh, G. Park, N. Cho","doi":"10.23919/APSIPAASC55919.2022.9979964","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979964","url":null,"abstract":"Under-Display Camera (UDC) systems have been developed to remove noticeable camera holes or notches and entirely cover the front side with the screen. As the name implies, UDCs are placed under the display, generally organic light-emitting diode (OLED) these days. Since the OLED panel is not transparent and consists of circuits and display devices, the light reaching the camera experiences a loss of photons and a complicated point spread function (PSF). As a result, the obtained images through the UDC system usually experi-ence a color shift, decreased intensity, complex artifacts due to the PSF, and loss/distortion in high-frequency details. To overcome these degradations, we exploit the multi-stage image restoration network and frequency loss function. The network utilizes deformable convolutions to solve the spatially-variant degradations in UDC images based on the fact that the kernel of deformable convolutions is dynamic and adaptive to input. We also apply frequency reconstruction loss when training our models to better restore the lost high-frequency components due to the complicated PSF. We show that our method effectively removes the degradation caused by the UDC system and achieves state-of-the-art performance on a benchmark dataset.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116854253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Approximated ADMM based Algorithm for $ell_{1}-ell_{2}$ Optimization Problem 基于近似ADMM的$ell_{1}-ell_{2}$优化算法
Rui Lin, Kazunori Hayashi
{"title":"An Approximated ADMM based Algorithm for $ell_{1}-ell_{2}$ Optimization Problem","authors":"Rui Lin, Kazunori Hayashi","doi":"10.23919/APSIPAASC55919.2022.9980002","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980002","url":null,"abstract":"Compressed sensing is a technique to recover a sparse vector from its underdetermined linear measurements. Since a naive $ell_{0}$ optimization approach is hard to tackle due to the discreteness and the non-convexity of $ell_{0}$ norm, a relaxed problem of the $ell_{1}-ell_{2}$ optimization is often employed for the reconstruction of the sparse vector especially when the measurement noise is not negligible. FISTA (fast iterative shrinkage-thresholding algorithm) is one of popular algorithms for the $ell_{1}-ell_{2}$ optimization, and is known to achieve optimal convergence rate among the first order methods. Recently, the employment of optical circuits for various signal processing including deep neural networks has been considered intensively, but it is difficult to implement FISTA with the optical circuit, because it requires operations of divisions with a dynamic value in the algorithm. In this paper, assuming the implementation with the optical circuit, we propose an ADMM (alternating direction method of multipliers) based algorithm for the $ell_{1}-ell_{2}$ optimization. It is true that an ADMM based algorithm for the $ell_{1}-ell_{2}$ optimization has been already proposed in the literature, but the proposed algorithm is derived with the different formulation from the existing method, and unlike the existing ADMM based algorithm, the proposed algorithm does not include the calculation of the inverse of a matrix. Computer simulation results demonstrate that the proposed algorithm can achieve comparable performance as FISTA or existing ADMM based algorithm while requiring no division operations and no matrix inversions.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131068050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信