Ehsan Variani, Tongzhou Chen, J. Apfel, B. Ramabhadran, Seungjin Lee, P. Moreno
{"title":"Neural Oracle Search on N-BEST Hypotheses","authors":"Ehsan Variani, Tongzhou Chen, J. Apfel, B. Ramabhadran, Seungjin Lee, P. Moreno","doi":"10.1109/ICASSP40776.2020.9054745","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054745","url":null,"abstract":"In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The algorithm provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed algorithm is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This algorithm achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. We also propose the use of a binary classifier gating function that can learn to trigger the 2nd-pass neural search model when the 1-best hypothesis is not the oracle hypothesis, thereby avoiding extra computation.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"7824-7828"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89039762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"One-Bit Compressed Sensing Using Generative Models","authors":"Geethu Joseph, Swatantra Kafle, P. Varshney","doi":"10.1109/ICASSP40776.2020.9054212","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054212","url":null,"abstract":"In this paper, we address the classical problem of one-bit compressed sensing. We present a deep learning based reconstruction algorithm that relies on a generative model. The generator which is a neural network, learns a mapping from a low dimensional space to a higher dimensional set comprising of sparse vectors. This pre-trained generator is used to reconstruct sparse vectors from their one-bit measurements by searching over the range of the generator. Hence, the algorithm presented in this paper provides excellent reconstruction accuracy by accounting for any other possible structure in the signal apart from sparsity. Further, we provide theoretical guarantees on the reconstruction accuracy of the presented algorithm. Using numerical results, we also demonstrate the efficacy of our algorithm compared to other existing algorithms.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"3437-3441"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90375046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequential Methods for Detecting a Change in the Distribution of an Episodic Process","authors":"T. Banerjee, Edmond Adib, A. Taha, E. John","doi":"10.1109/ICASSP40776.2020.9054529","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054529","url":null,"abstract":"A new class of stochastic processes called episodic processes is introduced to model the statistical regularity of data observed in several applications in cyberphysical systems, neuroscience, and medicine. Algorithms are proposed to detect a change in the distribution of episodic processes. The algorithms can be computed recursively using finite memory and are shown to be asymptotically optimal for well-defined Bayesian or minimax stochastic optimization formulations. The application of the developed algorithms to detect a change in waveform patterns is also discussed.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"72 1","pages":"6009-6013"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90483984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuan Hu, Guixia Kang, Beibei Hou, Yiyuan Ma, F. Labeau, Zichen Su
{"title":"Acu-Net: A 3D Attention Context U-Net for Multiple Sclerosis Lesion Segmentation","authors":"Chuan Hu, Guixia Kang, Beibei Hou, Yiyuan Ma, F. Labeau, Zichen Su","doi":"10.1109/ICASSP40776.2020.9054616","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054616","url":null,"abstract":"Multiple Sclerosis (MS) lesion segmentation from MR images is important for neuroimaging analysis. MS is diffuse, multifocal, and tend to involve peripheral brain structures such as the white matter, corpus callosum, and brainstem. Recently, U-Net has made great achievements in medical image segmentation area. However, the insufficiently use of context information and feature representation, makes it fail to achieve segmentation of MS lesions accurately. To solve the problem, 3D attention context U-Net (ACU-Net) is proposed for MS lesion segmentation in this paper. The proposed ACU-Net includes 3D spatial attention block, which is used to enrich spatial details and feature representation of lesion in the decoding stage. Furthermore, in the encoding and decoding stage of the network, 3D context guided module is designed for guiding local information and surrounding information. The proposed ACU-Net was evaluated on the ISBI 2015 longitudinal MS lesion segmentation challenge dataset, and it achieved superior performance compared to latest approaches.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"1384-1388"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80543003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Eeg Connectivity - Informed Cooperative Adaptive Line Enhancer for Recognition of Brain State","authors":"S. Sanei, C. C. Took, D. Jarchi, A. Procházka","doi":"10.1109/ICASSP40776.2020.9052923","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052923","url":null,"abstract":"Bursts of sleep spindles and paroxysmal fast brain activity waveforms have frequency overlap whilst generally, paroxysmal waveforms have shorter duration than spindles. Both resemble bursts of normal alpha activity during short rests while awake with closed eyes. In this paper, it is shown that for a proposed cooperative adaptive line enhancer, which can both detect and separate such periodic bursts, the combination weights are consistently different from each other. The outcome suggests that for accurate modelling of the brain neuro-generators, the brain connectivity has to be precisely estimated and plugged into the adaptation process.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"110 1","pages":"1195-1199"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80544704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"One-Shot Voice Conversion by Vector Quantization","authors":"Da-Yi Wu, Hung-yi Lee","doi":"10.1109/ICASSP40776.2020.9053854","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053854","url":null,"abstract":"In this paper, we propose a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label. We model the content embedding as a series of discrete codes and take the difference between quantize-before and quantize-after vector as the speaker embedding. We show that this approach has a strong ability to disentangle the content and speaker information with reconstruction loss only, and one-shot VC is thus achieved.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"7734-7738"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80702566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lantao Yu, Dehong Liu, H. Mansour, P. Boufounos, Yanting Ma
{"title":"Blind Multi-Spectral Image Pan-Sharpening","authors":"Lantao Yu, Dehong Liu, H. Mansour, P. Boufounos, Yanting Ma","doi":"10.1109/ICASSP40776.2020.9053554","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053554","url":null,"abstract":"We address the problem of sharpening low spatial-resolution multi-spectral (MS) images with their associated misaligned high spatial-resolution panchromatic (PAN) image, based on priors on the spatial blur kernel and on the cross-channel relationship. In particular, we formulate the blind pan-sharpening problem within a multi-convex optimization framework using total generalized variation for the blur kernel and local Laplacian prior for the cross-channel relationship. The problem is solved by the alternating direction method of multipliers (ADMM), which alternately updates the blur kernel and sharpens intermediate MS images. Numerical experiments demonstrate that our approach is more robust to large misalignment errors and yields better super resolved MS images compared to state-of-the-art optimization-based and deep-learning-based algorithms.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"1429-1433"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80704696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasitha Warahena Liyanage, Daphney-Stavroula Zois, C. Chelmis
{"title":"On–The–Fly Feature Selection and Classification with Application to Civic Engagement Platforms","authors":"Yasitha Warahena Liyanage, Daphney-Stavroula Zois, C. Chelmis","doi":"10.1109/ICASSP40776.2020.9053564","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053564","url":null,"abstract":"Online feature selection and classification is crucial for time sensitive decision making. Existing work however either assumes that features are independent or produces a fixed number of features for classification. Instead, we propose an optimal framework to perform joint feature selection and classification on–the–fly while relaxing the assumption on feature independence. The effectiveness of the proposed approach is showed by classifying urban issue reports on the SeeClickFix civic engagement platform. A significant reduction in the average number of features used is observed without a drop in the classification accuracy.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"88 1","pages":"3762-3766"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83829363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Constrained Encoders Correcting a Single Nucleotide Edit in DNA Storage","authors":"K. Cai, Xuan He, H. M. Kiah, T. T. Nguyen","doi":"10.1109/ICASSP40776.2020.9053256","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053256","url":null,"abstract":"A nucleotide substitution is said to occur when a base in {A, T} is substituted for a base in {C, G}, or vice versa. Recent experiment (Heckel et al. 2019) showed that a nucleotide substitution occurs with a significantly higher probability than other substitution errors. A nucleotide edit refers to a single insertion, deletion or nucleotide substitution. In this paper, we investigate codes that corrects a single nucleotide edit and provide linear-time algorithms that encode binary messages into these codes of length n.Specifically, we provide an order-optimal encoder which corrects a single nucleotide edit with logn + loglogn + O(1) redundant bits. We also demonstrate that the codewords obey certain runlength constraints and that the code can be modified to accommodate certain GC-content constraints.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"8827-8830"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83208674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating Written Domain Numeric Grammars into End-To-End Contextual Speech Recognition Systems for Improved Recognition of Numeric Sequences","authors":"Ben Haynor, Petar S. Aleksic","doi":"10.1109/ICASSP40776.2020.9054259","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054259","url":null,"abstract":"Accurate recognition of numeric sequences is crucial for many contextual speech recognition applications. For example, a user might create a calendar event and be prompted by a virtual assistant for the time, date, and duration of the event. We propose a modular and scalable solution for improved recognition of numeric sequences. We use finite state transducers built from written domain numeric grammars to increase the likelihood of hypotheses containing matching numeric entities during beam search in an end-to-end speech recognition system. Using our technique results in relative reduction in word error rate of up to 59% on a variety of numeric sequence recognition tasks (times, percentages, digit sequences, …).","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"7809-7813"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83293891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}