{"title":"Scrambling-Embedding in Partially-Encrypted Images","authors":"Koi Yee Ng, Simying Ong","doi":"10.23919/APSIPAASC55919.2022.9979991","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979991","url":null,"abstract":"In this paper, an improved scrambling-embedding technique, namely row-rotational-based data hiding method is proposed to hide data in partially-encrypted images. The partially-encrypted images are generated by performing bit-wise XOR-cipher to investigate the feasibility of applying the proposed method in various encryption levels. The proposed method is performed by divided each row into multiple non-overlapping continuous partitions. These partitions will be arranged in a rotational manner to create different states, while each state will be used to represent specific data in binary representation. During the decoding process, α notation is introduced to reduce the number of failure rows, which will cause further image degradation and incorrect data extraction. The BSDS300 dataset is utilized for experiments, and encrypted with different encryption strengths. From the experiment results, it is observed that when least significant bits are encrypted, the proposed data hiding method using scrambling-embedding technique can still performed well as in the plain image domain.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129375033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aastha Kachhi, Anand Therattil, Ankur T. Patil, Hardik B. Sailor, H. Patil
{"title":"Teager Energy Cepstral Coefficients For Classification of Dysarthric Speech Severity-Level","authors":"Aastha Kachhi, Anand Therattil, Ankur T. Patil, Hardik B. Sailor, H. Patil","doi":"10.23919/APSIPAASC55919.2022.9980322","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980322","url":null,"abstract":"Dysarthria is a neuro-motor speech impairment that renders speech unintelligibility, which is generally imperceptible to humans w.r.t severity-levels. Dysarthric speech classification acts as a diagnostic tool for evaluating the advancement in a patient's severity condition and also aids in automatic dysarthric speech recognition systems (an important assistive speech technology). This study investigates the significance of Teager Energy Cepstral Coefficients (TECC) in dysarthric speech classification using three deep learning architectures, namely, Convolutional Neural Network (CNN), Light-CNN (LCNN), and Residual Networks (ResNet). The performance of TECC is compared with state-of-the-art features, such as Short-Time Fourier Transform (STFT), Mel Frequency Cepstral Coefficients (MFCC), and Linear Frequency Cepstral Coefficients (LFCC). In addition, this study also investigate the effectiveness of cepstral features over the spectral features for this problem. The highest classification accuracy achieved using UA-Speech corpus is 97.18%, 94.63%, and 98.02% (i.e., absolute improvement of 1.98%, 1.41%, and 1.69%) with CNN, LCNN, and ResNet, respectively, as compared to the MFCC. Further, we evaluate feature discriminative capability using $F1$-score, Matthew's Correlation Coefficient (MCC), Jaccard index, and Hamming loss. Finally, analysis of latency period w.r.t. state-of-the-art feature sets indicates the potential of TECC for practical deployment of the severity-level classification system.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen
{"title":"Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-best hypotheses","authors":"Hsin-Wei Wang, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen","doi":"10.23919/APSIPAASC55919.2022.9979951","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979951","url":null,"abstract":"Automatic speech recognition (ASR) has recently achieved remarkable success and reached human parity, thanks to the synergistic breakthroughs in neural model architectures and training algorithms. However, the performance of ASR in many real-world use cases is still far from perfect. There has been a surge of research interest in designing and developing feasible post-processing modules to improve recognition performance by refining ASR output sentences, which fall roughly into two categories. The first category of methods is ASR N-best hypothesis reranking. ASR N-best hypothesis reranking aims to find the oracle hypothesis with the lowest word error rate from a given N-best hypothesis list. The other category of methods take inspiration from, for example, Chinese spelling correction (CSC) or English spelling correction (ESC), seeking to detect and correct text-level errors of ASR output sentences. In this paper, we attempt to integrate the above two methods into the ASR error correction (AEC) module and explore the impact of different kinds of features on AEC. Empirical experiments on the widely-used AISHELL-l dataset show that our proposed method can significantly reduce the word error rate (WER) of the baseline ASR transcripts in relation to some top-of-line AEC methods, thereby demonstrating its effectiveness and practical feasibility.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object Detection in Aerial Images with Attention-based Regression Loss","authors":"Chandler Timm C. Doloriel, R. Cajote","doi":"10.23919/APSIPAASC55919.2022.9980311","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980311","url":null,"abstract":"Object detection is a computer vision technique used to identify objects that are usually present in natural scenes. However, the methods used for this case are not easily transferable to detect objects in aerial images. Objects in aerial images are mostly arbitrary-oriented, small, and in complex backgrounds compared to upright and well-focused objects in natural scenes. To effectively detect objects in aerial images, we propose a new regression loss function based on the attention mechanism through attention weights. Using the relative position of the attention weights to the bounding box, the foreground is given more attention, which highlights the target object and effectively suppresses the noise and background. Preliminary experiments are conducted on an attention-based object detector using the DOTA dataset to test the capability of attention mechanism in extracting the contextual information of objects, especially in complex environments.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130834451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification","authors":"Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee","doi":"10.23919/APSIPAASC55919.2022.9980225","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980225","url":null,"abstract":"Voice can represent a person's identity. Thus, it can be used in automatic speaker verification (ASV) systems for authenticating secure applications. Unfortunately, existing ASV systems are vulnerable to spoofing attacks. A replay attack is a widely used spoofing technique because it is simple but difficult to detect. Hence, many methods are proposed for countermeasures against replay attacks. Most work inseparably considers voice and non-voice sections in the detection's performance. In this work, we investigate the spoof detection performances when the voice, non-voice, and both with different percentages of voice are used to obtain the optimal section. We also propose a method for detecting replay attacks using the optimal section of a signal. Mel-frequency cepstral coefficients are calculated from the optimal section as a feature, and the ResNet-34 model is used for classification. We evaluated the proposed method using a dataset from the ASVspoof 2019 challenge. The results depict that the optimal section for replay attack detection is when 10% and 20% of voice are included in the non-voice sections. It also showed that the proposed method outperforms the baselines with a 7.52% relatively improvement or an equal error rate of 1.72%.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130867969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Catastrophic forgetting avoidance method for a Classification Model by Model Synthesis and Introduction of Background Data","authors":"Hirayama Akari, Kimura Masaomi","doi":"10.23919/APSIPAASC55919.2022.9980154","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980154","url":null,"abstract":"Animals including humans, continuously acquire knowledge and skills throughout their lives. However, many machine learning models cannot learn new tasks without forgetting past knowledge. In neural networks, it is common to use one neural network for each training task, and successive training will reduce the accuracy of the previous task. This problem is called catastrophic forgetting, and research on continual learning is being conducted to solve it. In this paper, we proposed a method to reducing catastrophic forgetting, where new tasks are trained without retaining previously trained data. Our method assumes that tasks are classification. Our method adds random data to the training data in order to combine models trained on different tasks to avoid exceed generalization in the domain where train data do not exist combines models separately trained for each tasks. In the evaluation experiments, we confirmed that our method reduced forgetting for the original two-dimensional dataset and MNIST dataset.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128757045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenta Yamada, Yoshiki Masuyama, Yukoh Wakabayashi, Nobutaka Ono
{"title":"Simultaneous Frequency Estimation for Three or More Sinusoids Based on Sinusoidal Constraint Differential Equation","authors":"Kenta Yamada, Yoshiki Masuyama, Yukoh Wakabayashi, Nobutaka Ono","doi":"10.23919/APSIPAASC55919.2022.9980228","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980228","url":null,"abstract":"In this paper, we present a short-time frequency estimation method that can handle multiple sinusoids simultaneously. Frequency estimation is a fundamental problem in audio analysis. For realizing high-temporal resolution, an approach based on a differential equation of a sinusoid, which is referred to as the sinusoidal constraint differential equation (SCDE), has been proposed. The SCDE-based method can efficiently and accurately estimate frequency even from a short-term signal. However, in terms of simultaneous estimation, up to two sinusoids have been considered so far. In this paper, we extend this approach to three or more sinusoids. Our experimental results show that our method outperformed existing methods based on the discrete Fourier transform.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129084214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of Deep Learning-based Single-channel Speech Enhancement for Frequency-modulation Transmitted Speech","authors":"Yingyi Ma, Xueliang Zhang","doi":"10.23919/APSIPAASC55919.2022.9980216","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980216","url":null,"abstract":"There are three main interferences in the FM signal trans-mission process-Multipath effect, Doppler effect, and White noise. These interferences have significant influences on speech. We proposed a method that uses a masking or mapping approach for single-channel speech enhancement in wireless communication. Since the method improves speech equality by focusing on three interferences simultaneously, it is simpler in comparison to conventional methods. Experiments are conducted on the dataset, which is simulated by ourselves. Because the PESQ and STOI need reference targets, it is hard to evaluate the performance using real-world data. So we only give the spectral comparison of the real data enhancement results. Simulation results show excellent speech enhancement performance on the unprocessed mixture and significantly improve speech quality on the actual collected data. It verifies the feasibility of deep learning on this kind of task. Future studies will be made to improve the real-time performance and compress the number of network parameters.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130796944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Restoration of High-Frequency Components in Under Display Camera Images","authors":"Youngjin Oh, G. Park, N. Cho","doi":"10.23919/APSIPAASC55919.2022.9979964","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979964","url":null,"abstract":"Under-Display Camera (UDC) systems have been developed to remove noticeable camera holes or notches and entirely cover the front side with the screen. As the name implies, UDCs are placed under the display, generally organic light-emitting diode (OLED) these days. Since the OLED panel is not transparent and consists of circuits and display devices, the light reaching the camera experiences a loss of photons and a complicated point spread function (PSF). As a result, the obtained images through the UDC system usually experi-ence a color shift, decreased intensity, complex artifacts due to the PSF, and loss/distortion in high-frequency details. To overcome these degradations, we exploit the multi-stage image restoration network and frequency loss function. The network utilizes deformable convolutions to solve the spatially-variant degradations in UDC images based on the fact that the kernel of deformable convolutions is dynamic and adaptive to input. We also apply frequency reconstruction loss when training our models to better restore the lost high-frequency components due to the complicated PSF. We show that our method effectively removes the degradation caused by the UDC system and achieves state-of-the-art performance on a benchmark dataset.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116854253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Approximated ADMM based Algorithm for $ell_{1}-ell_{2}$ Optimization Problem","authors":"Rui Lin, Kazunori Hayashi","doi":"10.23919/APSIPAASC55919.2022.9980002","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980002","url":null,"abstract":"Compressed sensing is a technique to recover a sparse vector from its underdetermined linear measurements. Since a naive $ell_{0}$ optimization approach is hard to tackle due to the discreteness and the non-convexity of $ell_{0}$ norm, a relaxed problem of the $ell_{1}-ell_{2}$ optimization is often employed for the reconstruction of the sparse vector especially when the measurement noise is not negligible. FISTA (fast iterative shrinkage-thresholding algorithm) is one of popular algorithms for the $ell_{1}-ell_{2}$ optimization, and is known to achieve optimal convergence rate among the first order methods. Recently, the employment of optical circuits for various signal processing including deep neural networks has been considered intensively, but it is difficult to implement FISTA with the optical circuit, because it requires operations of divisions with a dynamic value in the algorithm. In this paper, assuming the implementation with the optical circuit, we propose an ADMM (alternating direction method of multipliers) based algorithm for the $ell_{1}-ell_{2}$ optimization. It is true that an ADMM based algorithm for the $ell_{1}-ell_{2}$ optimization has been already proposed in the literature, but the proposed algorithm is derived with the different formulation from the existing method, and unlike the existing ADMM based algorithm, the proposed algorithm does not include the calculation of the inverse of a matrix. Computer simulation results demonstrate that the proposed algorithm can achieve comparable performance as FISTA or existing ADMM based algorithm while requiring no division operations and no matrix inversions.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131068050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}