E. Guizzo, Riccardo F. Gramaccioni, Saeid Jamili, C. Marinoni, Edoardo Massaro, Claudia Medaglia, Giuseppe Nachira, Leonardo Nucciarelli, Ludovica Paglialunga, Marco Pennese, Sveva Pepe, Enrico Rocchi, A. Uncini, D. Comminiello
{"title":"L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing","authors":"E. Guizzo, Riccardo F. Gramaccioni, Saeid Jamili, C. Marinoni, Edoardo Massaro, Claudia Medaglia, Giuseppe Nachira, Leonardo Nucciarelli, Ludovica Paglialunga, Marco Pennese, Sveva Pepe, Enrico Rocchi, A. Uncini, D. Comminiello","doi":"10.1109/MLSP52302.2021.9596248","DOIUrl":"https://doi.org/10.1109/MLSP52302.2021.9596248","url":null,"abstract":"The L3DAS21 Challenge11www.13das.com/mlsp2021 is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing, with particular focus on 3D speech enhancement (SE) and 3D sound localization and detection (SELD). Alongside with the challenge, we release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage. Usually, machine learning approaches to 3D audio tasks are based on single-perspective Ambisonics recordings or on arrays of single-capsule microphones. We propose, instead, a novel multichannel audio configuration based multiple-source and multiple-perspective Ambisonics recordings, performed with an array of two first-order Ambisonics microphones. To the best of our knowledge, it is the first time that a dualmic Ambisonics configuration is used for these tasks. We provide baseline models and results for both tasks, obtained with state-of-the-art architectures: FaSNet for SE and SELDnet for SELD.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128141916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis","authors":"Aleksandar Vakanski, Min Xian","doi":"10.1109/MLSP52302.2021.9596501","DOIUrl":"https://doi.org/10.1109/MLSP52302.2021.9596501","url":null,"abstract":"The generalization error of deep learning models for medical image analysis often increases on images collected with different devices for data acquisition, device settings, or patient population. A better understanding of the generalization capacity on new images is crucial for clinicians' trustworthiness. Although significant efforts have been recently directed toward establishing generalization bounds and complexity measures, there is still a significant discrepancy between the predicted and actual generalization performance. As well, related large empirical studies have been primarily based on validation with general-purpose image datasets. This paper presents an empirical study that investigates the correlation between 25 complexity measures and the generalization abilities of deep learning classifiers for breast ultrasound images. The results indicate that PAC-Bayes flatness and path norm measures produce the most consistent explanation for the combination of models and data. We also report that multi-task classification and segmentation approach for breast images is conducive toward improved generalization.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126946096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manik Kuchroo, Abhinav Godavarthi, Alexander Tong, Guy Wolf, Smita Krishnaswamy
{"title":"Multimodal Data Visualization and Denoising with Integrated Diffusion","authors":"Manik Kuchroo, Abhinav Godavarthi, Alexander Tong, Guy Wolf, Smita Krishnaswamy","doi":"10.1109/mlsp52302.2021.9596214","DOIUrl":"https://doi.org/10.1109/mlsp52302.2021.9596214","url":null,"abstract":"We propose a method called integrated diffusion for combining multimodal data, gathered via different sensors on the same system, to create a integrated data diffusion operator. As real world data suffers from both local and global noise, we introduce mechanisms to optimally calculate a diffusion operator that reflects the combined information in data by maintaining low frequency eigenvectors of each modality both globally and locally. We show the utility of this integrated operator in denoising and visualizing multimodal toy data as well as multi-omic data generated from blood cells, measuring both gene expression and chromatin accessibility. Our approach better visualizes the geometry of the integrated data and captures known cross-modality associations. More generally, integrated diffusion is broadly applicable to multimodal datasets generated by noisy sensors collected in a variety of fields.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131948297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GRAD-CAM Guided Channel-Spatial Attention Module for Fine-Grained Visual Classification","authors":"Shuai Xu, Dongliang Chang, Jiyang Xie, Zhanyu Ma","doi":"10.1109/mlsp52302.2021.9596481","DOIUrl":"https://doi.org/10.1109/mlsp52302.2021.9596481","url":null,"abstract":"Fine-grained visual classification (FGVC) is becoming an important research field, due to its wide applications and the rapid development of computer vision technologies. The current state-of-the-art (SOTA) methods in the FGVC usually employ attention mechanisms to first capture the semantic parts and then discover their subtle differences between distinct classes. The existing attention modules have significantly improved the classification performance but they are poorly guided since part-based detectors in the FGVC depend on the network learning ability without the supervision of part annotations. As obtaining such part annotations is labor-expensive, some visual localization and explanation methods, such as gradient-weighted class activation mapping (Grad-CAM), can be utilized for supervising the attention mechanism. In this paper, we propose a Grad-CAM guided channel-spatial attention module for the FGVC, which employs the Grad-CAM to supervise and constrain the attention weights by generating the coarse localization maps. To demonstrate the effectiveness of the proposed method, we conduct comprehensive experiments on three popular FGVC datasets, including CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets. The proposed method outperforms the SOTA attention modules in the FGVC task. In addition, visualizations of the feature maps demonstrate the superiority of the proposed method against the SOTA approaches.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126181784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Nguyen, M. Sadeghi, E. Ricci, Xavier Alameda-Pineda
{"title":"Deep Variational Generative Models for Audio-Visual Speech Separation","authors":"V. Nguyen, M. Sadeghi, E. Ricci, Xavier Alameda-Pineda","doi":"10.1109/mlsp52302.2021.9596406","DOIUrl":"https://doi.org/10.1109/mlsp52302.2021.9596406","url":null,"abstract":"In this paper, we are interested in audio-visual speech separation given a single-channel audio recording as well as visual information (lips movements) associated with each speaker. We propose an unsupervised technique based on audio-visual generative modeling of clean speech. More specifically, during training, a latent variable generative model is learned from clean speech spectra using a variational auto-encoder (VAE). To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data. The visual modality also serves as a prior for latent variables, through a visual network. At test time, the learned generative model (both for speaker-independent and speaker-dependent scenarios) is combined with an unsupervised non-negative matrix factorization (NMF) variance model for background noise. All the latent variables and noise parameters are then estimated by a Monte Carlo expectation-maximization algorithm. Our experiments show that the proposed unsupervised VAE-based method yields better separation performance than NMF-based approaches as well as a supervised deep learning-based technique.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126925007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Semi-Supervised Generative Adversarial Network for Prediction of Genetic Disease Outcomes","authors":"C. Davi, U. Braga-Neto","doi":"10.1109/mlsp52302.2021.9596351","DOIUrl":"https://doi.org/10.1109/mlsp52302.2021.9596351","url":null,"abstract":"For most diseases, building large databases of labeled genetic data is an expensive and time-demanding task. To address this, we introduce genetic Generative Adversarial Networks (gGAN), a semi-supervised approach based on an innovative GAN architecture to create large synthetic genetic data sets starting with a small amount of labeled data and a large amount of unlabeled data. Our goal is to create a mechanism able to increase the sample size of the labeled data and generalize learning over different populations while keeping awareness of the quality of its own predictions. The proposed model achieved satisfactory results using real genetic data from different datasets and populations, in which the test populations may not have the same genetic profiles. The proposed model is self-aware and capable of determining whether a new genetic profile has enough compatibility with the data on which the network was trained and is thus suitable for prediction. The code and datasets used can be found at https://github.com/caio-davi/gGAN","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124120571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Siivola, Akash Kumar Dhaka, M. R. Andersen, Javier I. González, Pablo G. Moreno, Aki Vehtari
{"title":"Preferential Batch Bayesian Optimization","authors":"E. Siivola, Akash Kumar Dhaka, M. R. Andersen, Javier I. González, Pablo G. Moreno, Aki Vehtari","doi":"10.1109/mlsp52302.2021.9596494","DOIUrl":"https://doi.org/10.1109/mlsp52302.2021.9596494","url":null,"abstract":"Most research in Bayesian optimization (BO) has focused on direct feedback scenarios, where one has access to exact values of some expensive-to-evaluate objective. This direction has been mainly driven by the use of BO in machine learning hyperparameter configuration problems. However, in domains such as modelling human preferences, A/B tests, or recommender systems, there is a need for methods that can replace direct feedback with preferential feedback, obtained via rankings or pairwise comparisons. In this work, we present preferential batch Bayesian optimization (PBBO), a new framework that allows finding the optimum of a latent function of interest, given any type of parallel preferential feedback for a group of two or more points. We do so by using a Gaussian process model with a likelihood specially designed to enable parallel and efficient data collection mechanisms, which are key in modern machine learning. We show how the acquisitions developed under this framework generalize and augment previous approaches in Bayesian optimization, expanding the use of these techniques to a wider range of domains. An extensive simulation study shows the benefits of this approach, both with simulated functions and four real data sets.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"207 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120899791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}