M. Raza, Khubaib Ahmed, Junaid Muzaffar, Ahsan Adeel
{"title":"Two-Point Neurons for Efficient Multimodal Speech Enhancement","authors":"M. Raza, Khubaib Ahmed, Junaid Muzaffar, Ahsan Adeel","doi":"10.1109/ICASSPW59220.2023.10193457","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193457","url":null,"abstract":"Here we present a two-point neuron-inspired deep convolutional net (DCN) with 18 convolutional layers for multimodal speech enhancement (MM-SE) and compare it against conventional point neuron-inspired DCN in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI). We show that the two-point neuron-driven DCN performs comparably to point-neurons driven DCN by using only ≈0.2% neurons at any time during training.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132829459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble Methods For Enhanced Covid-19 CT Scan Severity Analysis","authors":"A. Thyagachandran, H. Murthy","doi":"10.1109/ICASSPW59220.2023.10193538","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193538","url":null,"abstract":"Computed Tomography (CT) scans provide a high-resolution image of the lungs, allowing clinicians to identify the severity of infections in COVID-19 patients. This paper presents a domain knowledge-based pipeline for extracting infection regions from COVID-19 patients using a combination of image-processing algorithms and a pre-trained UNET model. Then, an infection rate-based feature vector is generated for each CT scan. The infection severity is then classified into four categories using an ensemble of three machine-learning models: Random Forest, Support Vector Machines, and Extremely Randomized Trees. The proposed system is evaluated on the validation and test datasets with a macro F1 score of 58% and 46.31%, respectively. Our proposed model has achieved $3 ^{rd}$ place in the severity detection challenge as part of the IEEE ICASSP 2023: AI-enabled Medical Image Analysis Workshop and COVID-19 Diagnosis Competition (AI-MIACOV19D). The implementation of the proposed system is available at https://github.com/aanandt/Enhancing-COVID19-Severity-Analysis-through-Ensemble-Methods.git","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131932613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Rakhimov, Bile Peng, Eduard Axel Jorswieck, M. Haardt
{"title":"Robust Reflective Beamforming For Non-Terrestrial Networks Under Thermal Deformations","authors":"D. Rakhimov, Bile Peng, Eduard Axel Jorswieck, M. Haardt","doi":"10.1109/ICASSPW59220.2023.10193299","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193299","url":null,"abstract":"In this paper, we present a beamforming method that is robust against thermal deformations for non-terrestrial reconfigurable intelligent surfaces (RIS). We analytically derive the expressions for the worst-case bound on perturbations of the covariance matrix and the corresponding steering vectors as functions of possible displacements of RIS elements. We apply these bounds during the optimization procedure to find the beamforming coefficients that are robust to thermal deformations. Moreover, we present a simple heuristic to obtain the constant modulus beamforming coefficients from the optimal beamforming via an array thinning operation. The simulation results confirm the robustness of the proposed solution against random but bounded perturbations caused by thermal deformations of the reflective surface.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115765038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David F. Harwath, Edison Thomaz
{"title":"A Dataset for Foreground Speech Analysis With Smartwatches In Everyday Home Environments","authors":"Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David F. Harwath, Edison Thomaz","doi":"10.1109/ICASSPW59220.2023.10192949","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10192949","url":null,"abstract":"Acoustic sensing has proved effective as a foundation for applications in health and human behavior analysis. In this work, we focus on detecting in-person social interactions in naturalistic settings from audio captured by a smartwatch. As a first step, it is critical to distinguish the speech of the individual wearing the watch (foreground speech) from all other sounds nearby, such as speech from other individuals and ambient sounds. Given the considerable burden of collecting and annotating real-world training data and the lack of existing online data resources, this paper introduces a dataset for foreground speech detection of users wearing a smartwatch. The data is collected from 39 participants interacting with family members in real homes. We then present a benchmark study for the dataset with different test setups. Furthermore, we explore a model-free heuristic method to identify foreground instances based on transfer learning embeddings.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115129508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joohyun Son, Jehyun Heo, Hyunwook Lee, Seungwoo Sung, Minchul Hong, Hanwoong Kim, Gayeon Ahn, D. Hong
{"title":"Frequency Asynchronous Noma In LEO Satellite Communication Systems","authors":"Joohyun Son, Jehyun Heo, Hyunwook Lee, Seungwoo Sung, Minchul Hong, Hanwoong Kim, Gayeon Ahn, D. Hong","doi":"10.1109/ICASSPW59220.2023.10193732","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193732","url":null,"abstract":"In this paper, frequency asynchronous non-orthogonal multiple access (FA-NOMA) is applied to the low earth orbit (LEO) satellite communication (SatCom) system. There have been attempts to employ power domain NOMA (P-NOMA) to support large numbers of users in SatCom systems. However, P-NOMA is not beneficial in LEO SatCom environments where the power difference between users is small. In comparison, FA-NOMA utilizes intentional frequency offsets rather than environmental characteristics, making it a suitable technique for supporting large numbers of users in the LEO satellite environments.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"313 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115445759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taro Miyazaki, Naoki Nakatani, Tsubasa Uchida, H. Kaneko, Masanori Sano
{"title":"Machine Translation to Sign Language Using Post-Translation Replacement Without Placeholders","authors":"Taro Miyazaki, Naoki Nakatani, Tsubasa Uchida, H. Kaneko, Masanori Sano","doi":"10.1109/ICASSPW59220.2023.10193419","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193419","url":null,"abstract":"Sign language is typically the first language for those who are born deaf or who lose their hearing in early childhood. To provide important information for these individuals, it is better to use sign language than to transcribe spoken languages. We have been developing a system that translates Japanese into Japanese Sign Language (JSL) and then generates computer graphics (CG) animation of JSL.In this paper, we propose a machine translation method for translating Japanese into JSL. The proposed method is based on an encoder-decoder model that utilizes a pre-trained model as the encoder, and the proper names in the translation result are revised using a dictionary by means of a post-translation replacement method without placeholders. Our experimental results demonstrate that using the pre-trained model as the encoder and performing the post-translation replacement of proper names both contributed to improving the translation quality.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122990404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey of Datasets, Applications, and Models for IMU Sensor Signals","authors":"Aparajita Saraf, Seungwhan Moon, Andrea Madotto","doi":"10.1109/ICASSPW59220.2023.10193365","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193365","url":null,"abstract":"Inertial Measurement Units (IMUs) are small, low-cost sensors that can measure accelerations and angular velocities, making them valuable tools for a variety of applications, including robotics, virtual reality, and healthcare. With the advent of deep learning, there has been a surge of interest in using IMU data to train DNN models for various applications. In this paper, we survey the state-of-the-art ML models including deep neural network models and applications for IMU sensors. We first provide an overview of IMU sensors and the types of data they generate. We then review the most popular models for IMU data, including convolutional neural networks, recurrent neural networks, and attention-based models. We also discuss the challenges associated with training deep neural networks on IMU data, such as data scarcity, noise, and sensor drift. Finally, we present a comprehensive review of the most prominent applications of deep neural networks for IMU data, including human activity recognition, gesture recognition, gait analysis, and fall detection. Overall, this survey provides a comprehensive overview of the stateof-the-art deep neural network models and applications for IMU sensors and highlights the challenges and opportunities in this rapidly evolving field.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125153490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tsubasa Uchida, Naoki Nakatani, Taro Miyazaki, H. Kaneko, Masanori Sano
{"title":"Motion Editing Tool for Reproducing Grammatical Elements of Japanese Sign Language Avatar Animation","authors":"Tsubasa Uchida, Naoki Nakatani, Taro Miyazaki, H. Kaneko, Masanori Sano","doi":"10.1109/ICASSPW59220.2023.10193198","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193198","url":null,"abstract":"For deaf and hard of hearing people whose native language is sign language, it is necessary to provide information not only with subtitles but also with sign language. One means of providing information in sign language is to use animated avatars. Therefore, we have developed a system that generates a Japanese Sign Language (JSL) avatar animation from Japanese sentences utilizing Japanese-to-JSL translation and a motion-data-driven animation generation method. In this paper, we propose a motion editing tool that can adjust grammatical elements of JSL by editing each motion data forming a JSL sentence. An evaluation experiment shows that editing the motion speed and blend span of multiple words corresponding to the delimitation of phrases and clauses can reduce the error rate of understanding JSL avatar animations.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125784347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Allmin Pradhap Singh Susaiyah, Aki Härmä, Simone Balloccu, E. Reiter, M. Petkovic
{"title":"Smart Selection of Useful Insights from Wearables","authors":"Allmin Pradhap Singh Susaiyah, Aki Härmä, Simone Balloccu, E. Reiter, M. Petkovic","doi":"10.1109/ICASSPW59220.2023.10193140","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193140","url":null,"abstract":"The popularity of wearable-devices equipped with inertial measurement units (IMUs) and optical sensors has increased in recent years. These sensors provide valuable activity and heart-rate data that, when analysed across multiple users and over time, can offer profound insights into individual lifestyle habits. However, the high dimensionality of such data and user preference dynamics present significant challenges for mining useful insights. This paper proposes a novel approach that employs natural language processing to mine insights from wearable-data, utilising a neural network model that leverages end-to-end feedback from users. Results demonstrate that this approach effectively increased daily step counts among users, showcasing the potential of this method for optimising health and wellness outcomes.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130108538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lung Segmentation Enhances COVID-19 Detection","authors":"R. Turnbull","doi":"10.1109/ICASSPW59220.2023.10193492","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193492","url":null,"abstract":"Improving automated analysis of medical imaging will provide clinicians more options in providing care for patients. The 2023 AI-enabled Medical Image Analysis Workshop and Covid-19 Diagnosis Competition (AI-MIA-COV19D) provides an opportunity to test and refine machine learning methods for detecting the presence and severity of COVID-19 in patients from CT scans. This paper presents version 2 of Cov3d, a deep learning model submitted in the 2022 competition. The model has been improved through a preprocessing step which segments the lungs in the CT scan and crops the input to this region. It results in a macro F1 score of 84.92% for predicting the presence of COVID-19 in the CT scans on the test dataset which came second place in the competition. The model achieved a macro F1 score of 59.06% on the test dataset for predicting the severity of COVID-19 which was the best performing model for that task of the competition.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130130743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}