APSIPA Transactions on Signal and Information Processing最新文献

筛选
英文 中文
Two-stage pyramidal convolutional neural networks for image colorization 用于图像着色的两阶段金字塔卷积神经网络
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-10-08 DOI: 10.1017/ATSIP.2021.13
Yu-Jen Wei, Tsu-Tsai Wei, Tien-Ying Kuo, Po-Chyi Su
{"title":"Two-stage pyramidal convolutional neural networks for image colorization","authors":"Yu-Jen Wei, Tsu-Tsai Wei, Tien-Ying Kuo, Po-Chyi Su","doi":"10.1017/ATSIP.2021.13","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.13","url":null,"abstract":"The development of colorization algorithms through deep learning has become the current research trend. These algorithms colorize grayscale images automatically and quickly, but the colors produced are usually subdued and have low saturation. This research addresses this issue of existing algorithms by presenting a two-stage convolutional neural network (CNN) structure with the first and second stages being a chroma map generation network and a refinement network, respectively. To begin, we convert the color space of an image from RGB to HSV to predict its low-resolution chroma components and therefore reduce the computational complexity. Following that, the first-stage output is zoomed in and its detail is enhanced with a pyramidal CNN, resulting in a colorized image. Experiments show that, while using fewer parameters, our methodology produces results with more realistic color and higher saturation than existing methods.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46027458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
3D skeletal movement-enhanced emotion recognition networks 三维骨骼运动增强的情感识别网络
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-08-05 DOI: 10.1017/ATSIP.2021.11
Jiaqi Shi, Chaoran Liu, C. Ishi, H. Ishiguro
{"title":"3D skeletal movement-enhanced emotion recognition networks","authors":"Jiaqi Shi, Chaoran Liu, C. Ishi, H. Ishiguro","doi":"10.1017/ATSIP.2021.11","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.11","url":null,"abstract":"Automatic emotion recognition has become an important trend in the fields of human–computer natural interaction and artificial intelligence. Although gesture is one of the most important components of nonverbal communication, which has a considerable impact on emotion recognition, it is rarely considered in the study of emotion recognition. An important reason is the lack of large open-source emotional databases containing skeletal movement data. In this paper, we extract three-dimensional skeleton information from videos and apply the method to IEMOCAP database to add a new modality. We propose an attention-based convolutional neural network which takes the extracted data as input to predict the speakers’ emotional state. We also propose a graph attention-based fusion method that combines our model with the models using other modalities, to provide complementary information in the emotion classification task and effectively fuse multimodal cues. The combined model utilizes audio signals, text information, and skeletal data. The performance of the model significantly outperforms the bimodal model and other fusion strategies, proving the effectiveness of the method.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42864803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compression efficiency analysis of AV1, VVC, and HEVC for random access applications AV1、VVC和HEVC在随机接入应用中的压缩效率分析
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-07-13 DOI: 10.1017/ATSIP.2021.10
Tung Nguyen, D. Marpe
{"title":"Compression efficiency analysis of AV1, VVC, and HEVC for random access applications","authors":"Tung Nguyen, D. Marpe","doi":"10.1017/ATSIP.2021.10","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.10","url":null,"abstract":"AOM Video 1 (AV1) and Versatile Video Coding (VVC) are the outcome of two recent independent video coding technology developments. Although VVC is the successor of High Efficiency Video Coding (HEVC) in the lineage of international video coding standards jointly developed by ITU-T and ISO/IEC within an open and public standardization process, AV1 is a video coding scheme that was developed by the industry consortium Alliance for Open Media (AOM) and that has its technological roots in Google's proprietary VP9 codec. This paper presents a compression efficiency evaluation for the AV1, VVC, and HEVC video coding schemes in a typical video compression application requiring random access. The latter is an important property, without which essential functionalities in digital video broadcasting or streaming could not be provided. For the evaluation, we employed a controlled experimental environment that basically follows the guidelines specified in the Common Test Conditions of the Joint Video Experts Team. As representatives of the corresponding video coding schemes, we selected their freely available reference software implementations. Depending on the application-specific frequency of random access points, the experimental results show averaged bit-rate savings of about 10–15% for AV1 and 36–37% for the VVC reference encoder implementation (VTM), both relative to the HEVC reference encoder implementation (HM) and by using a test set of video sequences with different characteristics regarding content and resolution. A direct comparison between VTM and AV1 reveals averaged bit-rate savings of about 25–29% for VTM, while the averaged encoding and decoding run times of VTM relative to those of AV1 are around 300% and 270%, respectively.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2021.10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47250980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TGHop: an explainable, efficient, and lightweight method for texture generation TGHop:一个可解释的、高效的、轻量级的纹理生成方法
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-07-08 DOI: 10.1017/ATSIP.2021.15
Xuejing Lei, Ganning Zhao, Kaitai Zhang, C. J. Kuo
{"title":"TGHop: an explainable, efficient, and lightweight method for texture generation","authors":"Xuejing Lei, Ganning Zhao, Kaitai Zhang, C. J. Kuo","doi":"10.1017/ATSIP.2021.15","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.15","url":null,"abstract":"An explainable, efficient, and lightweight method for texture generation, called TGHop (an acronym of Texture Generation PixelHop), is proposed in this work. Although synthesis of visually pleasant texture can be achieved by deep neural networks, the associated models are large in size, difficult to explain in theory, and computationally expensive in training. In contrast, TGHop is small in its model size, mathematically transparent, efficient in training and inference, and able to generate high-quality texture. Given an exemplary texture, TGHop first crops many sample patches out of it to form a collection of sample patches called the source. Then, it analyzes pixel statistics of samples from the source and obtains a sequence of fine-to-coarse subspaces for these patches by using the PixelHop++ framework. To generate texture patches with TGHop, we begin with the coarsest subspace, which is called the core, and attempt to generate samples in each subspace by following the distribution of real samples. Finally, texture patches are stitched to form texture images of a large size. It is demonstrated by experimental results that TGHop can generate texture images of superior quality with a small model size and at a fast speed.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":"10 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42046813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A protection method of trained CNN model with a secret key from unauthorized access 一种使用密钥保护训练好的CNN模型不受未经授权访问的方法
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-05-31 DOI: 10.1017/ATSIP.2021.9
AprilPyone Maungmaung, H. Kiya
{"title":"A protection method of trained CNN model with a secret key from unauthorized access","authors":"AprilPyone Maungmaung, H. Kiya","doi":"10.1017/ATSIP.2021.9","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.9","url":null,"abstract":"In this paper, we propose a novel method for protecting convolutional neural network models with a secret key set so that unauthorized users without the correct key set cannot access trained models. The method enables us to protect not only from copyright infringement but also the functionality of a model from unauthorized access without any noticeable overhead. We introduce three block-wise transformations with a secret key set to generate learnable transformed images: pixel shuffling, negative/positive transformation, and format-preserving Feistel-based encryption. Protected models are trained by using transformed images. The results of experiments with the CIFAR and ImageNet datasets show that the performance of a protected model was close to that of non-protected models when the key set was correct, while the accuracy severely dropped when an incorrect key set was given. The protected model was also demonstrated to be robust against various attacks. Compared with the state-of-the-art model protection with passports, the proposed method does not have any additional layers in the network, and therefore, there is no overhead during training and inference processes.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2021.9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48451361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
The future of biometrics technology: from face recognition to related applications 生物识别技术的未来:从人脸识别到相关应用
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-05-28 DOI: 10.1017/ATSIP.2021.8
Hitoshi Imaoka, H. Hashimoto, Koichi Takahashi, Akinori F. Ebihara, Jianquan Liu, Akihiro Hayasaka, Yusuke Morishita, K. Sakurai
{"title":"The future of biometrics technology: from face recognition to related applications","authors":"Hitoshi Imaoka, H. Hashimoto, Koichi Takahashi, Akinori F. Ebihara, Jianquan Liu, Akihiro Hayasaka, Yusuke Morishita, K. Sakurai","doi":"10.1017/ATSIP.2021.8","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.8","url":null,"abstract":"Biometric recognition technologies have become more important in the modern society due to their convenience with the recent informatization and the dissemination of network services. Among such technologies, face recognition is one of the most convenient and practical because it enables authentication from a distance without requiring any authentication operations manually. As far as we know, face recognition is susceptible to the changes in the appearance of faces due to aging, the surrounding lighting, and posture. There were a number of technical challenges that need to be resolved. Recently, remarkable progress has been made thanks to the advent of deep learning methods. In this position paper, we provide an overview of face recognition technology and introduce its related applications, including face presentation attack detection, gaze estimation, person re-identification and image data mining. We also discuss the research challenges that still need to be addressed and resolved.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2021.8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44465370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Audio-to-score singing transcription based on a CRNN-HSMM hybrid model 基于CRNN-HSMM混合模型的音频到乐谱唱歌转录
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-04-20 DOI: 10.1017/ATSIP.2021.4
Ryo Nishikimi, Eita Nakamura, Masataka Goto, Kazuyoshi Yoshii
{"title":"Audio-to-score singing transcription based on a CRNN-HSMM hybrid model","authors":"Ryo Nishikimi, Eita Nakamura, Masataka Goto, Kazuyoshi Yoshii","doi":"10.1017/ATSIP.2021.4","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.4","url":null,"abstract":"This paper describes an automatic singing transcription (AST) method that estimates a human-readable musical score of a sung melody from an input music signal. Because of the considerable pitch and temporal variation of a singing voice, a naive cascading approach that estimates an F0 contour and quantizes it with estimated tatum times cannot avoid many pitch and rhythm errors. To solve this problem, we formulate a unified generative model of a music signal that consists of a semi-Markov language model representing the generative process of latent musical notes conditioned on musical keys and an acoustic model based on a convolutional recurrent neural network (CRNN) representing the generative process of an observed music signal from the notes. The resulting CRNN-HSMM hybrid model enables us to estimate the most-likely musical notes from a music signal with the Viterbi algorithm, while leveraging both the grammatical knowledge about musical notes and the expressive power of the CRNN. The experimental results showed that the proposed method outperformed the conventional state-of-the-art method and the integration of the musical language model with the acoustic model has a positive effect on the AST performance.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2021.4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44040319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Speech emotion recognition based on listener-dependent emotion perception models 基于听者依赖情绪感知模型的语音情绪识别
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-04-20 DOI: 10.1017/ATSIP.2021.7
Atsushi Ando, Takeshi Mori, Satoshi Kobashikawa, T. Toda
{"title":"Speech emotion recognition based on listener-dependent emotion perception models","authors":"Atsushi Ando, Takeshi Mori, Satoshi Kobashikawa, T. Toda","doi":"10.1017/ATSIP.2021.7","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.7","url":null,"abstract":"This paper presents a novel speech emotion recognition scheme that leverages the individuality of emotion perception. Most conventional methods simply poll multiple listeners and directly model the majority decision as the perceived emotion. However, emotion perception varies with the listener, which forces the conventional methods with their single models to create complex mixtures of emotion perception criteria. In order to mitigate this problem, we propose a majority-voted emotion recognition framework that constructs listener-dependent (LD) emotion recognition models. The LD model can estimate not only listener-wise perceived emotion, but also majority decision by averaging the outputs of the multiple LD models. Three LD models, fine-tuning, auxiliary input, and sub-layer weighting, are introduced, all of which are inspired by successful domain-adaptation frameworks in various speech processing tasks. Experiments on two emotional speech datasets demonstrate that the proposed approach outperforms the conventional emotion recognition frameworks in not only majority-voted but also listener-wise perceived emotion recognition.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":"10 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2021.7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"57024191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automatic Deception Detection using Multiple Speech and Language Communicative Descriptors in Dialogs 对话中使用多个语音和语言交际描述符的自动欺骗检测
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-04-16 DOI: 10.1017/ATSIP.2021.6
Huang-Cheng Chou, Yi-Wen Liu, Chi-Chun Lee
{"title":"Automatic Deception Detection using Multiple Speech and Language Communicative Descriptors in Dialogs","authors":"Huang-Cheng Chou, Yi-Wen Liu, Chi-Chun Lee","doi":"10.1017/ATSIP.2021.6","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.6","url":null,"abstract":"While deceptive behaviors are a natural part of human life, it is well known that human is generally bad at detecting deception. In this study, we present an automatic deception detection framework by comprehensively integrating prior domain knowledge in deceptive behavior understanding. Specifically, we compute acoustics, textual information, implicatures with non-verbal behaviors, and conversational temporal dynamics for improving automatic deception detection in dialogs. The proposed model reaches start-of-the-art performance on the Daily Deceptive Dialogues corpus of Mandarin (DDDM) database, 80.61% unweighted accuracy recall in deception recognition. In the further analyses, we reveal that (i) the deceivers’ deception behaviors can be observed from the interrogators’ behaviors in the conversational temporal dynamics features and (ii) some of the acoustic features (e.g. loudness and MFCC) and textual features are significant and effective indicators to detect deception behaviors.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2021.6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45889529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Analyzing public opinion on COVID-19 through different perspectives and stages. 从不同角度和阶段分析有关 COVID-19 的舆论。
IF 3.2
APSIPA Transactions on Signal and Information Processing Pub Date : 2021-03-17 eCollection Date: 2021-01-01 DOI: 10.1017/ATSIP.2021.5
Yuqi Gao, Hang Hua, Jiebo Luo
{"title":"Analyzing public opinion on COVID-19 through different perspectives and stages.","authors":"Yuqi Gao, Hang Hua, Jiebo Luo","doi":"10.1017/ATSIP.2021.5","DOIUrl":"10.1017/ATSIP.2021.5","url":null,"abstract":"<p><p>In recent months, COVID-19 has become a global pandemic and had a huge impact on the world. People under different conditions have very different attitudes toward the epidemic. Due to the real-time and large-scale nature of social media, we can continuously obtain a massive amount of public opinion information related to the epidemic from social media. In particular, researchers may ask questions such as \"how is the public reacting to COVID-19 in China during different stages of the pandemic?\", \"what factors affect the public opinion orientation in China?\", and so on. To answer such questions, we analyze the pandemic-related public opinion information on Weibo, China's largest social media platform. Specifically, we have first collected a large amount of COVID-19-related public opinion microblogs. We then use a sentiment classifier to recognize and analyze different groups of users' opinions. In the collected sentiment-orientated microblogs, we try to track the public opinion through different stages of the COVID-19 pandemic. Furthermore, we analyze more key factors that might have an impact on the public opinion of COVID-19 (e.g. users in different provinces or users with different education levels). Empirical results show that the public opinions vary along with the key factors of COVID-19. Furthermore, we analyze the public attitudes on different public-concerning topics, such as staying at home and quarantine. In summary, we uncover interesting patterns of users and events as an insight into the world through the lens of a major crisis.</p>","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":"10 ","pages":"e8"},"PeriodicalIF":3.2,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8082129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39124603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信