Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)最新文献

MAP Image Recovery with Guarantees using Locally Convex Multi-Scale Energy (LC-MUSE) Model. 基于局部凸多尺度能量（LC-MUSE）模型的MAP图像恢复。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2025-04-01 Epub Date: 2025-03-07 DOI: 10.1109/ICASSP49660.2025.10889960

Jyothi Rikhab Chand, Mathews Jacob

{"title":"MAP Image Recovery with Guarantees using Locally Convex Multi-Scale Energy (LC-MUSE) Model.","authors":"Jyothi Rikhab Chand, Mathews Jacob","doi":"10.1109/ICASSP49660.2025.10889960","DOIUrl":"10.1109/ICASSP49660.2025.10889960","url":null,"abstract":"We propose a multi-scale deep energy model that is strongly convex in the local neighbourhood around the data manifold to represent its probability density, with application in inverse problems. In particular, we represent the negative log-prior as a multi-scale energy model parameterized by a Convolutional Neural Network (CNN). We restrict the gradient of the CNN to be locally monotone, which constrains the model as a Locally Convex Multi-Scale Energy (LC-MuSE). We use the learned energy model in image-based inverse problems, where the formulation offers several desirable properties: i) uniqueness of the solution, ii) convergence guarantees to a minimum of the inverse problem, and iii) robustness to input perturbations. In the context of parallel Magnetic Resonance (MR) image reconstruction, we show that the proposed method performs better than the state-of-the-art convex regularizers, while the performance is comparable to plug-and-play regularizers and end-to-end trained methods.","PeriodicalId":74518,"journal":{"name":"Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)","volume":"2025 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12974777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION. 探索空间声音事件表征的自监督对比学习。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2024-04-01 Epub Date: 2024-03-18 DOI: 10.1109/icassp48485.2024.10447391

Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

{"title":"EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION.","authors":"Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani","doi":"10.1109/icassp48485.2024.10447391","DOIUrl":"10.1109/icassp48485.2024.10447391","url":null,"abstract":"In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.","PeriodicalId":74518,"journal":{"name":"Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)","volume":"2024 ","pages":"1281-1285"},"PeriodicalIF":0.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11268432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GLACIER: GLASS-BOX TRANSFORMER FOR INTERPRETABLE DYNAMIC NEUROIMAGING. 冰川：用于可解释动态神经成像的玻璃盒转换器。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2023-06-01 Epub Date: 2023-05-05 DOI: 10.1109/icassp49357.2023.10097126

Usman Mahmood, Zening Fu, Vince Calhoun, Sergey Plis

{"title":"GLACIER: GLASS-BOX TRANSFORMER FOR INTERPRETABLE DYNAMIC NEUROIMAGING.","authors":"Usman Mahmood, Zening Fu, Vince Calhoun, Sergey Plis","doi":"10.1109/icassp49357.2023.10097126","DOIUrl":"10.1109/icassp49357.2023.10097126","url":null,"abstract":"Deep learning models can perform as well or better than humans in many tasks, especially vision related. Almost exclusively, these models are used to perform classification or prediction. However, deep learning models are usually of black-box nature, and it is often difficult to interpret the model or the features. The lack of interpretability causes a restrain from applying deep learning to fields such as neuroimaging, where the results must be transparent, and interpretable. Therefore, we present a 'glass-box' deep learning model and apply it to the field of neuroimaging. Our model mixes spatial and temporal dimensions in succession to estimate dynamic connectivity between the brain's intrinsic networks. The interpretable connectivity matrices produced by our model result in beating state-of-the-art models on many tasks using multiple functional MRI datasets. More importantly, our model estimates task-based flexible connectivity matrices, unlike static methods such as Pearson's correlation coefficients.","PeriodicalId":74518,"journal":{"name":"Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10231935/pdf/nihms-1889297.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9626893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ACOUSTICALLY-DRIVEN PHONEME REMOVAL THAT PRESERVES VOCAL AFFECT CUES. 声学驱动的音素去除，保留声音影响线索。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2023-06-01 Epub Date: 2023-05-05 DOI: 10.1109/icassp49357.2023.10095942

Camille Noufi, Jonathan Berger, Michael Frank, Karen Parker, Daniel L Bowling

{"title":"ACOUSTICALLY-DRIVEN PHONEME REMOVAL THAT PRESERVES VOCAL AFFECT CUES.","authors":"Camille Noufi, Jonathan Berger, Michael Frank, Karen Parker, Daniel L Bowling","doi":"10.1109/icassp49357.2023.10095942","DOIUrl":"10.1109/icassp49357.2023.10095942","url":null,"abstract":"In this paper, we propose a method for removing linguistic information from speech for the purpose of isolating paralinguistic indicators of affect. The immediate utility of this method lies in clinical tests of sensitivity to vocal affect that are not confounded by language, which is impaired in a variety of clinical populations. The method is based on simultaneous recordings of speech audio and electroglotto-graphic (EGG) signals. The speech audio signal is used to estimate the average vocal tract filter response and amplitude envelop. The EGG signal supplies a direct correlate of voice source activity that is mostly independent of phonetic articulation. These signals are used to create a third signal designed to capture as much paralinguistic information from the vocal production system as possible-maximizing the retention of bioacoustic cues to affect-while eliminating phonetic cues to verbal meaning. To evaluate the success of this method, we studied the perception of corresponding speech audio and transformed EGG signals in an affect rating experiment with online listeners. The results show a high degree of similarity in the perceived affect of matched signals, indicating that our method is effective.","PeriodicalId":74518,"journal":{"name":"Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495117/pdf/nihms-1926898.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10608365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ROBUST ONLINE MULTIBAND DRIFT ESTIMATION IN ELECTROPHYSIOLOGY DATA. 电生理数据的鲁棒在线多波段漂移估计。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2023-06-01 Epub Date: 2023-05-05 DOI: 10.1109/icassp49357.2023.10095487

Charlie Windolf, Angelique C Paulk, Yoav Kfir, Eric Trautmann, Domokos Meszéna, William Muñoz, Irene Caprara, Mohsen Jamali, Julien Boussard, Ziv M Williams, Sydney S Cash, Liam Paninski, Erdem Varol

{"title":"ROBUST ONLINE MULTIBAND DRIFT ESTIMATION IN ELECTROPHYSIOLOGY DATA.","authors":"Charlie Windolf, Angelique C Paulk, Yoav Kfir, Eric Trautmann, Domokos Meszéna, William Muñoz, Irene Caprara, Mohsen Jamali, Julien Boussard, Ziv M Williams, Sydney S Cash, Liam Paninski, Erdem Varol","doi":"10.1109/icassp49357.2023.10095487","DOIUrl":"10.1109/icassp49357.2023.10095487","url":null,"abstract":"High-density electrophysiology probes have opened new possibilities for systems neuroscience in human and non-human animals, but probe motion poses a challenge for downstream analyses, particularly in human recordings. We improve on the state of the art for tracking this motion with four major contributions. First, we extend previous decentralized methods to use multiband information, leveraging the local field potential (LFP) in addition to spikes. Second, we show that the LFP-based approach enables registration at sub-second temporal resolution. Third, we introduce an efficient online motion tracking algorithm, enabling the method to scale up to longer and higher-resolution recordings, and possibly facilitating real-time applications. Finally, we improve the robustness of the approach by introducing a structure-aware objective and simple methods for adaptive parameter selection. Together, these advances enable fully automated scalable registration of challenging datasets from human and mouse.","PeriodicalId":74518,"journal":{"name":"Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10308877/pdf/nihms-1910468.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9741857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MULTIMODAL MICROSCOPY IMAGE ALIGNMENT USING SPATIAL AND SHAPE INFORMATION AND A BRANCH-AND-BOUND ALGORITHM. 使用空间和形状信息和分支定界算法的多模态显微镜图像对齐。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2023-06-01 Epub Date: 2023-05-05 DOI: 10.1109/icassp49357.2023.10096185

Shuonan Chen, Bovey Y Rao, Stephanie Herrlinger, Attila Losonczy, Liam Paninski, Erdem Varol

{"title":"MULTIMODAL MICROSCOPY IMAGE ALIGNMENT USING SPATIAL AND SHAPE INFORMATION AND A BRANCH-AND-BOUND ALGORITHM.","authors":"Shuonan Chen, Bovey Y Rao, Stephanie Herrlinger, Attila Losonczy, Liam Paninski, Erdem Varol","doi":"10.1109/icassp49357.2023.10096185","DOIUrl":"10.1109/icassp49357.2023.10096185","url":null,"abstract":"Multimodal microscopy experiments that image the same population of cells under different experimental conditions have become a widely used approach in systems and molecular neuroscience. The main obstacle is to align the different imaging modalities to obtain complementary information about the observed cell population (e.g., gene expression and calcium signal). Traditional image registration methods perform poorly when only a small subset of cells are present in both images, as is common in multimodal experiments. We cast multimodal microscopy alignment as a cell subset matching problem. To solve this non-convex problem, we introduce an efficient and globally optimal branch-and-bound algorithm to find subsets of point clouds that are in rotational alignment with each other. In addition, we use complementary information about cell shape and location to compute the matching likelihood of cell pairs in two imaging modalities to further prune the optimization search tree. Finally, we use the maximal set of cells in rigid rotational alignment to seed image deformation fields to obtain a final registration result. Our framework performs better than the state-of-the-art histology alignment approaches regarding matching quality and is faster than manual alignment, providing a viable solution to improve the throughput of multimodal microscopy experiments.","PeriodicalId":74518,"journal":{"name":"Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10308861/pdf/nihms-1910467.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10519859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK. 基于分波网络的运动说话者在线双耳语音分离。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2023-06-01 DOI: 10.1109/icassp49357.2023.10095695

Cong Han, Nima Mesgarani

{"title":"ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.","authors":"Cong Han, Nima Mesgarani","doi":"10.1109/icassp49357.2023.10095695","DOIUrl":"https://doi.org/10.1109/icassp49357.2023.10095695","url":null,"abstract":"Binaural speech separation in real-world scenarios often involves moving speakers. Most current speech separation methods use utterance-level permutation invariant training (u-PIT) for training. In inference time, however, the order of outputs can be inconsistent over time particularly in long-form speech separation. This situation which is referred to as the speaker swap problem is even more problematic when speakers constantly move in space and therefore poses a challenge for consistent placement of speakers in output channels. Here, we describe a real-time binaural speech separation model based on a Wavesplit network to mitigate the speaker swap problem for moving speaker separation. Our model computes a speaker embedding for each speaker at each time frame from the mixed audio, aggregates embeddings using online clustering, and uses cluster centroids as speaker profiles to track each speaker throughout the long duration. Experimental results on reverberant, long-form moving multitalker speech separation show that the proposed method is less prone to speaker swap and achieves comparable performance with u-PIT based models with ground truth tracking in both separation accuracy and preserving the interaural cues.","PeriodicalId":74518,"journal":{"name":"Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10417534/pdf/nihms-1919649.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10008233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

PHONEME-LEVEL BERT FOR ENHANCED PROSODY OF TEXT-TO-SPEECH WITH GRAPHEME PREDICTIONS. 音素级 bert，用于增强带词素预测的文本到语音的拟声效果。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2023-06-01 Epub Date: 2023-05-05 DOI: 10.1109/icassp49357.2023.10097074

Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

引用次数: 0

ROBUST TIME SERIES RECOVERY AND CLASSIFICATION USING TEST-TIME NOISE SIMULATOR NETWORKS. 使用测试时间噪声模拟器网络的鲁棒时间序列恢复和分类。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2023-06-01 DOI: 10.1109/icassp49357.2023.10096888

Eun Som Jeon, Suhas Lohit, Rushil Anirudh, Pavan Turaga

{"title":"ROBUST TIME SERIES RECOVERY AND CLASSIFICATION USING TEST-TIME NOISE SIMULATOR NETWORKS.","authors":"Eun Som Jeon, Suhas Lohit, Rushil Anirudh, Pavan Turaga","doi":"10.1109/icassp49357.2023.10096888","DOIUrl":"https://doi.org/10.1109/icassp49357.2023.10096888","url":null,"abstract":"Time-series are commonly susceptible to various types of corruption due to sensor-level changes and defects which can result in missing samples, sensor and quantization noise, unknown calibration, unknown phase shifts etc. These corruptions cannot be easily corrected as the noise model may be unknown at the time of deployment. This also results in the inability to employ pre-trained classifiers, trained on (clean) source data. In this paper, we present a general framework and models for time-series that can make use of (unlabeled) test samples to estimate the noise model-entirely at test time. To this end, we use a coupled decoder model and an additional neural network which acts as a learned noise model simulator. We show that the framework is able to \"clean\" the data so as to match the source training data statistics and the cleaned data can be directly used with a pre-trained classifier for robust predictions. We perform empirical studies on diverse application domains with different types of sensors, clearly demonstrating the effectiveness and generality of this method.","PeriodicalId":74518,"journal":{"name":"Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10426275/pdf/nihms-1920707.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10075806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

UNSUPERVISED CLUSTERING AND ANALYSIS OF CONTRACTION-DEPENDENT FETAL HEART RATE SEGMENTS. 收缩依赖性胎儿心率段的无监督聚类分析。

Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) Pub Date : 2022-05-01 Epub Date: 2022-04-27 DOI: 10.1109/icassp43922.2022.9747598

Liu Yang, Cassandra Heiselman, J Gerald Quirk, Petar M Djurić

引用次数: 2