2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
VocEmb4SVS: Improving Singing Voice Separation with Vocal Embeddings VocEmb4SVS:通过声音嵌入改善歌唱声音分离
Chenyi Li, Yi Li, Xuhao Du, Yaolong Ju, Shichao Hu, Zhiyong Wu
{"title":"VocEmb4SVS: Improving Singing Voice Separation with Vocal Embeddings","authors":"Chenyi Li, Yi Li, Xuhao Du, Yaolong Ju, Shichao Hu, Zhiyong Wu","doi":"10.23919/APSIPAASC55919.2022.9980293","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980293","url":null,"abstract":"Deep learning-based methods have shown promising performance on singing voice separation (SVS). Recently, embeddings related to lyrics and voice activities have been proven effective to improve the performance of SVS tasks. However, embeddings related to singers have never been studied before. In this paper, we propose VocEmb4SVS, an SVS framework to utilize vocal embeddings of the singer as auxiliary knowledge for SVS conditioning. First, a pre-trained separation network is employed to obtain pre-separated vocals from the mixed music signals. Second, a vocal encoder is trained to extract vocal embeddings from the pre-separated vocals. Finally, the vocal embeddings are integrated into the separation network to improve SVS performance. Experimental results show that our proposed method achieves state-of-the-art performance on the MUSDB18 dataset with an SDR of 9.56 dB on vocals.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132818689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Karaoke Generation from songs: recent trends and opportunities 卡拉ok一代:最近的趋势和机遇
Preet Patel, Ansh Ray, Khushboo Thakkar, Kahan Sheth, Sapan H. Mankad
{"title":"Karaoke Generation from songs: recent trends and opportunities","authors":"Preet Patel, Ansh Ray, Khushboo Thakkar, Kahan Sheth, Sapan H. Mankad","doi":"10.23919/APSIPAASC55919.2022.9980133","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980133","url":null,"abstract":"Music Information Retrieval is a crucial task which has ample opportunities in Music Industries. Currently, audio engineers have to create custom karaoke tracks manually for songs. The technique of producing a high-quality karaoke track for a song is not accessible to the public. Audacity and other specialised software must be needed to generate karaoke. In this work, we review different methods and approaches, which give a high-quality karaoke track by presenting a simple and quick separation of vocals from a given song with both vocal and instrumental components. It does not need the use of any specific audio processing software. We review techniques and approaches for generating karaoke such as Spleeter, Hybrid Demucs, D3Net, Open-Unmix, Sams-Net etc. These approaches are based on current state-of-the-art machine learning and deep learning techniques. We believe that this review will serve the purpose as a good resource for researchers working in this field.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134211165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNN-Rule Hybrid Dyna-Q for Sample-Efficient Task-Oriented Dialog Policy Learning 面向样本高效任务的对话策略学习的dnn -规则混合Dyna-Q
Mingxin Zhang, T. Shinozaki
{"title":"DNN-Rule Hybrid Dyna-Q for Sample-Efficient Task-Oriented Dialog Policy Learning","authors":"Mingxin Zhang, T. Shinozaki","doi":"10.23919/APSIPAASC55919.2022.9980344","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980344","url":null,"abstract":"Reinforcement learning (RL) is a powerful strategy for making a flexible task-oriented dialog agent, but it is weak in learning speed. Deep Dyna-Q augments the agent's experience to improve the learning efficiency by internally simulating the user's behavior. It uses a deep neural network (DNN) based learnable user model to predict user action, reward, and dialog termination from the dialog state and the agent's action. However, it still needs many agent-user interactions to train the user model. We propose a DNN-Rule hybrid user model for Dyna-Q, where the DNN only simulates the user action. Instead, a rule-based function infers the reward and the dialog termination. We also investigate the training with rollout to further enhance the learning efficiency. Experiments on a movie-ticket booking task demonstrate that our approach significantly improves learning efficiency.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133112838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of Optimal FIR Digital Filter by Swarm Optimization Technique 用群优化技术设计最优FIR数字滤波器
Jin Wu, Yaqiong Gao, L. Yang, Zhengdong Su
{"title":"Design of Optimal FIR Digital Filter by Swarm Optimization Technique","authors":"Jin Wu, Yaqiong Gao, L. Yang, Zhengdong Su","doi":"10.23919/APSIPAASC55919.2022.9980121","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980121","url":null,"abstract":"Finite Impulse Response (FIR) digital filters are widely used in digital signal processing and other engineering because of their strict stability and linear phase. Aiming at the problems of low accuracy and weak optimization ability of traditional method to design digital filter, the newly proposed Grey Wolf Optimization (GWO) algorithm is used in this paper to design a linear-phase FIR filter to obtain the optimal transition-band sample value in the frequency sampling method to obtain the minimum stop-band attenuation, so as to improve the performance of the filter. And improved by embedding Lévy Flight (LF), which is the modified Lévy-embedded GWO (LGWO). Finally, the performance of traditional frequency sampling methods and optimization algorithms GWO and LGWO are compared. When the number of sampling points is 65 and 97, the stopband attenuation of LGWO is improved by 0.2029 dB and 0.2454 dB respectively compared with GWO algorithm. The better performance of LGWO is shown in the results.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133292263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Educational Multi-Purpose Kit for Coding and Robotic Design 编程和机器人设计教育多用途工具包
Atikhun Thongpool, D. Hormdee, Raksit Chutipakdeevong, Wasan Tansakul
{"title":"Educational Multi-Purpose Kit for Coding and Robotic Design","authors":"Atikhun Thongpool, D. Hormdee, Raksit Chutipakdeevong, Wasan Tansakul","doi":"10.23919/APSIPAASC55919.2022.9979911","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979911","url":null,"abstract":"Nowadays, there has been a rapid evolution and transformation of technology. Many innovative technologies have emerged including artificial intelligence, biomedical engineering, automation systems, quantum computing, big data, blockchain, etc. These emerging technologies have also transformed our lifestyles. This transformation has then inevitably required a new set of skills; Computational Thinking/Logical Thinking which can be compiled into Coding skills. Several novel educational media and teaching materials have been promoted. Current educational kits in the market can be classified into 3 main categories. These structures vary from physical kits vs virtual kits vs hybrid kits, while coding styles vary from block-based vs text-based. This paper presents an educational multi-purpose kit for coding and robotic design which has a hybrid kits structure with block-based coding style. Its connection scheme has been designed as wired/wireless plug-and-play via magnetic. The implemented prototype could be resilient for various learning activities, including emulating three (touch, hearing and sight) out of five basic human senses via sensors and actuators. A use case on shape recognition, using computer vision, has been illustrated to show how the implemented system works.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"492 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132196612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection and Correction of Adversarial Examples Based on IPEG-Compression-Derived Distortion 基于ipeg压缩失真的对抗性样本检测与校正
Kenta Tsunomori, Yuma Yamasaki, M. Kuribayashi, N. Funabiki, I. Echizen
{"title":"Detection and Correction of Adversarial Examples Based on IPEG-Compression-Derived Distortion","authors":"Kenta Tsunomori, Yuma Yamasaki, M. Kuribayashi, N. Funabiki, I. Echizen","doi":"10.23919/APSIPAASC55919.2022.9980147","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980147","url":null,"abstract":"An effective way to defend against adversarial examples (AEs), which are used, for example, to attack applications such as face recognition, is to detect in advance whether an input image is an AE. Some AE defense methods focus on the response characteristics of image classifiers when denoising filters are applied to the input image. However, several filters are required, which results in a large amount of computation. Because JPEG compression of AEs effectively removes adversarial perturbations, the difference between the image before and after JPEG compression should be highly correlated with the perturbations. However, the difference should not be completely consistent with adversarial perturbations. We have developed a filtering operation that modulates this difference by varying their magnitude and positive/negative sign and adding them to an image so that adversarial perturbations can be effectively removed. We consider that adversarial perturbations that could not be removed by simply applying JPEG compression can be removed by modulating this difference. Furthermore, applying a resizing process to the image after adding these distortions enables us to remove perturbations that could not be removed otherwise. The filtering operation will successfully remove the adversarial noise and reconstruct the corrected samples from AEs. We also consider a simple but effective reconstruction method based on the filtering operations. Experiments in which the adversarial attack used was not known to the detector demonstrated that the proposed method could achieve better performance in terms of accuracy with reasonable computational complexity. In addition, the percentage of correct classification results after applying the proposed filter for non-targeted attacks was higher than that of JPEG compression and scaling. These results suggest that the proposed method effectively removes adversarial perturbations and is an effective filter for detecting AEs.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128993205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality Enhancement of Screen Content Video using Dual-input CNN 使用双输入CNN增强屏幕内容视频的质量
Ziyin Huang, Yue Cao, Sik-Ho Tsang, Yui-Lam Chan, K. Lam
{"title":"Quality Enhancement of Screen Content Video using Dual-input CNN","authors":"Ziyin Huang, Yue Cao, Sik-Ho Tsang, Yui-Lam Chan, K. Lam","doi":"10.23919/APSIPAASC55919.2022.9979969","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979969","url":null,"abstract":"In recent years, the video quality enhancement techniques have made a significant breakthrough, from the traditional methods, such as deblocking filter (DF) and sample additive offset (SAO), to deep learning-based approaches. While screen content coding (SCC) has become an important extension in High Efficiency Video Coding (HEVC), the existing approaches mainly focus on improving the quality of natural sequences in HEVC, not the screen content (SC) sequences in SCC. Therefore, we proposed a dual-input model for quality enhancement in SCC. One is the main branch with the image as input. Another one is the mask branch with side information extracted from the coded bitstream. Specifically, a mask branch is designed so that the coding unit (CU) information and the mode information are utilized as input, to assist the convolutional network at the main branch to further improve the video quality thereby the coding efficiency. Moreover, due to the limited number of SC videos, a new SCC dataset, namely PolyUSCC, is established. With our proposed dual-input technique, compared with the conventional SCC, BD-rates are further reduced 3.81% and 3.07%, by adding our mask branch onto two state-of-the-art models, DnCNN and DCAD, respectively.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123913961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech Intelligibility Prediction for Hearing Aids Using an Auditory Model and Acoustic Parameters 基于听觉模型和声学参数的助听器语音清晰度预测
Benita Angela Titalim, Candy Olivia Mawalim, S. Okada, M. Unoki
{"title":"Speech Intelligibility Prediction for Hearing Aids Using an Auditory Model and Acoustic Parameters","authors":"Benita Angela Titalim, Candy Olivia Mawalim, S. Okada, M. Unoki","doi":"10.23919/APSIPAASC55919.2022.9980000","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980000","url":null,"abstract":"Objective speech intelligibility (SI) metrics for hearing-impaired people play an important role in hearing aid development. The work on improving SI prediction also became the basis of the first Clarity Prediction Challenge (CPC1). This study investigates a physiological auditory model called EarModel and acoustic parameters for SI prediction. EarModel is utilized because it provides advantages in estimating human hearing, both normal and impaired. The hearing-impaired condition is simulated in EarModel based on audiograms; thus, the SI perceived by hearing-impaired people is more accurately predicted. Moreover, the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) and WavLM, as additional acoustic parameters for estimating the difficulty levels of given utterances, are included to achieve improved prediction accuracy. The proposed method is evaluated on the CPC1 database. The results show that the proposed method improves the SI prediction effects of the baseline and hearing aid speech prediction index (HASPI). Additionally, an ablation test shows that incorporating the eGeMAPS and WavLM can significantly contribute to the prediction model by increasing the Pearson correlation coefficient by more than 15% and decreasing the root-mean-square error (RMSE) by more than 10.00 in both closed-set and open-set tracks.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124028029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Obstructive Sleep Apnea Classification Using Snore Sounds Based on Deep Learning 基于深度学习的鼾声阻塞性睡眠呼吸暂停分类
Apichada Sillaparaya, A. Bhatranand, Chudanat Sudthongkong, K. Chamnongthai, Y. Jiraraksopakun
{"title":"Obstructive Sleep Apnea Classification Using Snore Sounds Based on Deep Learning","authors":"Apichada Sillaparaya, A. Bhatranand, Chudanat Sudthongkong, K. Chamnongthai, Y. Jiraraksopakun","doi":"10.23919/APSIPAASC55919.2022.9979938","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979938","url":null,"abstract":"Early screening for the Obstructive Sleep Apnea (OSA), especially the first grade of Apnea-Hypopnea Index (AHI), can reduce risk and improve the effectiveness of timely treatment. The current gold standard technique for OSA diagnosis is Polysomnography (PSG), but the technique must be performed in a specialized laboratory with an expert and requires many sensors attached to a patient. Hence, it is costly and may not be convenient for a self-test by the patient. The characteristic of snore sounds has recently been used to screen the OSA and more likely to identify the abnormality of breathing conditions. Therefore, this study proposes a deep learning model to classify the OSA based on snore sounds. The snore sound data of 5 OSA patients were selected from the opened-source PSG- Audio data by the Sleep Study Unit of the Sismanoglio-Amalia Fleming General Hospital of Athens [1]. 2,439 snoring and breathing-related sound segments were extracted and divided into 3 groups of 1,020 normal snore sounds, 1,185 apnea or hypopnea snore sounds, and 234 non-snore sounds. All sound segments were separated into 60% training, 20% validation, and 20% test sets, respectively. The mean of Mel-Frequency Cepstral Coefficients (MFCC) of a sound segment were computed as the feature inputs of the deep learning model. Three fully connected layers were used in this deep learning model to classify into three groups as (1) normal snore sounds, (2) abnormal (apnea or hypopnea) snore sounds, and (3) non-snore sounds. The result showed that the model was able to correctly classify 85.2459%. Therefore, the model is promising to use snore sounds for screening OSA.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121203955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Cognitive Test Results Using Concentration Estimation from Facial Videos 基于面部视频的认知测试结果评价
Terumi Umematsu, M. Tsujikawa, H. Sawada
{"title":"Evaluation of Cognitive Test Results Using Concentration Estimation from Facial Videos","authors":"Terumi Umematsu, M. Tsujikawa, H. Sawada","doi":"10.23919/APSIPAASC55919.2022.9980211","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980211","url":null,"abstract":"In this paper, we propose a method of discriminating between concentration and non-concentration on the basis of facial videos, and we confirm the usefulness of excluding cognitive test results when a user has not been concentrating. In a preliminary experiment, we have confirmed that level of concentration has a strong impact on correct answer rates in memory tests. Our proposed concentration/non-concentration discrimination method uses 15 features extracted from facial videos, including blinking, gazing, and facial expressions (Action Units), and discriminates between concentration and non-concentration, which are reflected in terms of a binary correct answer label set based on subjectively rated concentration levels. In the preliminary experiment, memory test scores during non-concentration states were lower than those during concentration states by an average of 18%. This has usually been included as measurement error, and, by excluding scores during non-concentration states using the proposed method, measurement error was reduced to 4%. The proposed method is shown to be capable of obtaining test results that indicate true cognitive functions when people are concentrating, making possible a more accurate understanding of cognitive functions.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121336943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信