Murat Muhammet Savci, Yasin Yildirim, Gorkem Saygili, B. U. Töreyin
{"title":"Fire Detection in H.264 Compressed Video","authors":"Murat Muhammet Savci, Yasin Yildirim, Gorkem Saygili, B. U. Töreyin","doi":"10.1109/ICASSP.2019.8683666","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683666","url":null,"abstract":"In this paper, we propose a compressed domain fire detection algorithm using macroblock types and Markov Model in H.264 video. Compressed domain method does not require decoding to pixel domain, instead a syntax parser extracts syntax elements which are only available in compressed domain. Our method extracts only macroblock type and corresponding macroblock address information. Markov model with fire and non-fire models are evaluated using offline-trained data. Our experiments show that the algorithm is able to detect and identify fire event in compressed domain successfully, despite a small chunk of data is used in the process.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"289 1","pages":"8310-8314"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84153498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Passive Detection and Discrimination of Body Movements in the sub-THz Band: A Case Study","authors":"S. Kianoush, S. Savazzi, V. Rampa","doi":"10.1109/ICASSP.2019.8682165","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682165","url":null,"abstract":"Passive radio sensing technique is a well established research topic where radio-frequency (RF) devices are used as real-time virtual probes that are able to detect the presence and the movement(s) of one or more (non instrumented) subjects. However, radio sensing methods usually employ frequencies in the unlicensed 2.4−5.0 GHz bands where multipath effects strongly limit their accuracy, thus reducing their wide acceptance. On the contrary, sub-terahertz (sub-THz) radiation, due to its very short wavelength and reduced multipath effects, is well suited for high-resolution body occupancy detection and vision applications. In this paper, for the first time, we adopt radio devices emitting in the 100 GHz band to process an image of the environment for body motion discrimination inside a workspace area. Movement detection is based on the real-time analysis of body-induced signatures that are estimated from sub-THz measurements and then processed by specific neural network-based classifiers. Experimental trials are employed to validate the proposed methods and compare their performances with application to industrial safety monitoring.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"1597-1601"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84182799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning the Spiral Sharing Network with Minimum Salient Region Regression for Saliency Detection","authors":"Zukai Chen, Xin Tan, Hengliang Zhu, Shouhong Ding, Lizhuang Ma, Haichuan Song","doi":"10.1109/ICASSP.2019.8682531","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682531","url":null,"abstract":"With the development of convolutional neural networks (CNNs), saliency detection methods have made a big progress in recent years. However, the previous methods sometimes mistakenly highlight the non-salient region, especially in complex backgrounds. To solve this problem, a two-stage method for saliency detection is proposed in this paper. In the first stage, a network is used to regress the minimum salient region (RMSR) containing all salient objects. Then in the second stage, in order to fuse the multi-level features, the spiral sharing network (SSN) is proposed for pixel-level detection on the result of RMSR. Experimental results on four public datasets show that our model is effective over the state-of-the-art approaches.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"125 1","pages":"1667-1671"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72836600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Autoencoding HRTFS for DNN Based HRTF Personalization Using Anthropometric Features","authors":"Tzu-Yu Chen, Tzu-Hsuan Kuo, T. Chi","doi":"10.1109/ICASSP.2019.8683814","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683814","url":null,"abstract":"We proposed a deep neural network (DNN) based approach to synthesize the magnitude of personalized head-related transfer functions (HRTFs) using anthropometric features of the user. To mitigate the over-fitting problem when training dataset is not very large, we built an autoencoder for dimensional reduction and establishing a crucial feature set to represent the raw HRTFs. Then we combined the decoder part of the autoencoder with a smaller DNN to synthesize the magnitude HRTFs. In this way, the complexity of the neural networks was greatly reduced to prevent unstable results with large variance due to overfitting. The proposed approach was compared with a baseline DNN model with no autoencoder. The log-spectral distortion (LSD) metric was used to evaluate the performance. Experiment results show that the proposed approach can reduce LSD of estimated HRTFs with greater stability.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"271-275"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76405984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multicast Beamforming Using Semidefinite Relaxation and Bounded Perturbation Resilience","authors":"Jochen Fink, R. Cavalcante, S. Stańczak","doi":"10.1109/ICASSP.2019.8682325","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682325","url":null,"abstract":"Semidefinite relaxation followed by randomization is a well-known approach for approximating a solution to the NP-hard max-min fair multicast beamforming problem. While providing a good approximation to the optimal solution, this approach commonly involves the use of computationally demanding interior point methods. In this study, we propose a solution based on superiorization of bounded perturbation resilient iterative operators that scales to systems with a large number of antennas. We show that this method outperforms the randomization techniques in many cases, while using only computationally simple operations.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"75 1","pages":"4749-4753"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87693878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Kiranyaz, T. Ince, Osama Abdeljaber, Onur Avcı, M. Gabbouj
{"title":"1-D Convolutional Neural Networks for Signal Processing Applications","authors":"S. Kiranyaz, T. Ince, Osama Abdeljaber, Onur Avcı, M. Gabbouj","doi":"10.1109/ICASSP.2019.8682194","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682194","url":null,"abstract":"1D Convolutional Neural Networks (CNNs) have recently become the state-of-the-art technique for crucial signal processing applications such as patient-specific ECG classification, structural health monitoring, anomaly detection in power electronics circuitry and motor-fault detection. This is an expected outcome as there are numerous advantages of using an adaptive and compact 1D CNN instead of a conventional (2D) deep counterparts. First of all, compact 1D CNNs can be efficiently trained with a limited dataset of 1D signals while the 2D deep CNNs, besides requiring 1D to 2D data transformation, usually need datasets with massive size, e.g., in the \"Big Data\" scale in order to prevent the well-known \"overfitting\" problem. 1D CNNs can directly be applied to the raw signal (e.g., current, voltage, vibration, etc.) without requiring any pre- or post-processing such as feature extraction, selection, dimension reduction, denoising, etc. Furthermore, due to the simple and compact configuration of such adaptive 1D CNNs that perform only linear 1D convolutions (scalar multiplications and additions), a real-time and low-cost hardware implementation is feasible. This paper reviews the major signal processing applications of compact 1D CNNs with a brief theoretical background. We will present their state-of-the-art performances and conclude with focusing on some major properties. Keywords – 1-D CNNs, Biomedical Signal Processing, SHM","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"8360-8364"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87978515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rini A. Sharon, Shrikanth S. Narayanan, M. Sur, H. Murthy
{"title":"An Empirical Study of Speech Processing in the Brain by Analyzing the Temporal Syllable Structure in Speech-input Induced EEG","authors":"Rini A. Sharon, Shrikanth S. Narayanan, M. Sur, H. Murthy","doi":"10.1109/ICASSP.2019.8683572","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683572","url":null,"abstract":"Clinical applicability of electroencephalography (EEG) is well established, however the use of EEG as a choice for constructing brain computer interfaces to develop communication platforms is relatively recent. To provide more natural means of communication, there is an increasing focus on bringing together speech and EEG signal processing. Quantifying the way our brain processes speech is one way of approaching the problem of speech recognition using brain waves. This paper analyses the feasibility of recognizing syllable level units by studying the temporal structure of speech reflected in the EEG signals. The slowly varying component of the delta band EEG(0.3-3Hz) is present in all other EEG frequency bands. Analysis shows that removing the delta trend in EEG signals results in signals that reveals syllable like structure. Using a 25 syllable framework, classification of EEG data obtained from 13 subjects yields promising results, underscoring the potential of revealing speech related temporal structure in EEG.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"4090-4094"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86300490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CRF-based Single-stage Acoustic Modeling with CTC Topology","authors":"Hongyu Xiang, Zhijian Ou","doi":"10.1109/ICASSP.2019.8682256","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682256","url":null,"abstract":"In this paper, we develop conditional random field (CRF) based single-stage (SS) acoustic modeling with connectionist temporal classification (CTC) inspired state topology, which is called CTC-CRF for short. CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology. Like SS-LF-MMI (lattice-free maximum-mutual-information), CTC-CRFs can be trained from scratch (flat-start), eliminating GMM-HMM pre-training and tree-building. Evaluation experiments are conducted on the WSJ, Switchboard and Librispeech datasets. In a head-to-head comparison, the CTC-CRF model using simple Bidirectional LSTMs consistently outperforms the strong SS-LF-MMI, across all the three benchmarking datasets and in both cases of mono-phones and mono-chars. Additionally, CTC-CRFs avoid some ad-hoc operation in SS-LF-MMI.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"5676-5680"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86079634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kantapon Kaewtip, F. Villavicencio, Fang-Yu Kuo, Mark Harvilla, I. Ouyang, P. Lanchantin
{"title":"Enhanced Virtual Singers Generation by Incorporating Singing Dynamics to Personalized Text-to-speech-to-singing","authors":"Kantapon Kaewtip, F. Villavicencio, Fang-Yu Kuo, Mark Harvilla, I. Ouyang, P. Lanchantin","doi":"10.1109/ICASSP.2019.8682968","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682968","url":null,"abstract":"We present in this work a strategy to enhance the quality of Text-to-Speech (TTS) based Singing Voice generation. Speech-to-singing refers to techniques transforming a spoken voice into singing, mainly by manipulating the duration and pitch of a spoken version of a song’s lyrics. While this strategy efficiently preserves the speaker identity, the generated singing is not always perceived fully natural since the vocal conditions generally change between spoken and singing voice. By incorporating speaker-independent natural singing information to TTS-based Speech-to-Singing (STS) we positively impact the sound quality (e.g. reducing hoarseness), as it is shown in the subjective evaluation reported at the end of this paper.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"6960-6964"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82804455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Songjie Liao, Xiao-Yang Liu, Feng Qian, Miao Yin, Guangmin Hu
{"title":"Tensor Super-resolution for Seismic Data","authors":"Songjie Liao, Xiao-Yang Liu, Feng Qian, Miao Yin, Guangmin Hu","doi":"10.1109/ICASSP.2019.8683419","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683419","url":null,"abstract":"In this paper, we propose a novel method for generating high-granularity three-dimensional (3D) seismic data from low-granularity data based on tensor sparse coding, which jointly trains a high-granularity dictionary and a low-granularity dictionary. First, considering the high-dimensional properties of seismic data, we introduce tensor sparse coding to seismic data interpolation. Second, we propose that the dictionary pairs trained by low-granularity seismic data and high-granularity seismic data have the same sparse representation, which are used to recover high-granularity data with the high-granularity dictionary. Finally, experiments on the seismic data of an actual field show that the proposed method effectively perform seismic trace interpolation and can improve the resolution of seismic data imaging.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"8598-8602"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83288224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}