A. Testa, D. Pastina, M. Zavagli, F. Santi, C. Pratola, M. Corvino
{"title":"Exploitation Of Single-Channel Space-Borne SAR Data for Ship Targets Imaging and Motion Parameters Estimation","authors":"A. Testa, D. Pastina, M. Zavagli, F. Santi, C. Pratola, M. Corvino","doi":"10.1109/ICASSPW59220.2023.10193026","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193026","url":null,"abstract":"Nowadays, space-borne Synthetic Aperture Radar (SAR) systems can provide images with very high-resolution, representing a formidable tool for maritime surveillance applications. Nevertheless, ship targets usually appear defocused in level-1 SAR images, due to their motion. Ship target refocusing can be achieved via Inverse SAR (ISAR) techniques, and a number of autofocus approaches have been proposed and tested in the past. However, the estimation of the full ship’s dynamic (translation and rotation), is a strategic information, also providing the possibility to scale the images in the uniform range-cross-range plane. In this work, the effectiveness of suitable single-channel ISAR techniques at estimating the ship velocity and at correctly scaling the images is assessed by means of Cosmo-SkyMed (CSK) data exploiting the Automatic Identification System (AIS) information as the reference ground truth.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117170517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Health Profiling Framework for Children Leveraging Multimodal Learning Based on Ambient Sensor Signals","authors":"Zhihan Jiang, Cong Xie, Edith C. H. Ngai","doi":"10.1109/ICASSPW59220.2023.10192968","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10192968","url":null,"abstract":"Traditional methods for health profiling are usually expensive and require specialized expertise. The growing prevalence and development of wearable devices have made it feasible to collect ambient sensor signals, providing us with new opportunities to profile children’s health in a cost-effective and comprehensive manner. Inspired by recent works in multimodal learning, we propose a health profiling framework for children. First, we extract context and motion patterns from their personal and family characteristics and acceleration signals. Then, context and motion embeddings are generated by two encoders and input into a lightweight neural network to profile children’s health from the perspectives of physical activity intensity, physical functioning, health confidence, psychosocial functioning, resilience, and connectedness. We evaluate the proposed method on real-world datasets, and the results show its outstanding performance. Specifically, the context pattern is effective in profiling children’s health, while the motion pattern is significantly effective in assessing children’s physical activity intensity.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127275741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Godwin Enemali, A. Bishnu, T. Ratnarajah, T. Arslan
{"title":"Towards an FPGA Implementation of IOT-Based Multi-Modal Hearing AID System","authors":"Godwin Enemali, A. Bishnu, T. Ratnarajah, T. Arslan","doi":"10.1109/ICASSPW59220.2023.10192936","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10192936","url":null,"abstract":"This paper presents implementing a cloud-based multimodal hearing aid (HA) system. It identifies major processing blocks to be implemented on an embedded Field Programmable Gate Array (FPGA) and focuses on aspects of a custom block that deals with the integration of multiple input sources of the HA. In particular, the integration block deals with the combination of audio and video (AV) data at different sampling rates, with each having a different number of bits to encode their information. It realizes seamless combination of AV data by assigning a scaled number of bits to each component in the combined signal to achieve a real-time data transmission of all data sources. In addition, key issues such as security is discussed and plan to address this in our future work is stated.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125942525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Alignment Between Sign Language Videos And Motion Capture Data: A Motion Energy-Based Approach","authors":"Fabrizio Nunnari, Mina Ameli, Shailesh Mishra","doi":"10.1109/ICASSPW59220.2023.10193528","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193528","url":null,"abstract":"In this paper, we propose a method for the automatic alignment of sign language videos and their corresponding motion capture data, useful for the preparation of multi-modal sign language corpora. First, we extract an estimate of the motion energy from both the video and the motion capture data. Second, we align the two curves to minimize their distance. Our tests show that it is possible to achieve a mean absolute error as low as 1.11 frames using optical flow for video energy extraction and a set of 22 bones for skeletal energy extraction.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123551338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling Large-Scale Probabilistic Seizure Detection with a Tensor-Network Kalman Filter for LS-SVM","authors":"S.J.S. de Rooij, K. Batselier, B. Hunyadi","doi":"10.1109/ICASSPW59220.2023.10193615","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193615","url":null,"abstract":"Recent advancements in wearable EEG devices have highlighted the importance of accurate seizure detection algorithms, yet the ever-increasing size of the generated datasets poses a significant challenge to existing seizure detection methods based on kernel machines. Typically, this problem is mitigated by significantly undersampling the majority class, but in practice, these methods tend to suffer from too many false alarms. Recent works have proposed tensor networks to enable large-scale classification with kernel machines. In this paper, we explore the use of a probabilistic tensor method, the tensor-network Kalman filter for LS-SVMs (TNKF-LSSVM), for seizure detection, as we hypothesize that using more data will improve the detection performance. We show that the TNKF-LSSVM performs comparably to a regular LSSVM in detecting seizures when both are trained on the same dataset. Additionally, the TNKF-LSSVM can provide meaningful uncertainty quantification, and it is able to handle large-scale datasets beyond the capabilities of the LS-SVM (i.e., $N gt 10 ^{5})$. However, for the presented model configuration detection performance does not seem to improve with more input data.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125568594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Air-To-Ground Communications Beyond 5G: The Formation Control of UAV Swarm","authors":"Xiao Fan, Peiran Wu, M. Xia","doi":"10.1109/ICASSPW59220.2023.10193143","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193143","url":null,"abstract":"This work proposes a novel air-to-ground communication model consisting of aerial base stations served by unmanned aerial vehicles (UAVs) and terrestrial user equipments (UEs) by integrating the technique of coordinated multi-point (CoMP) transmission with the theory of stochastic geometry. The effective UAV formation control scheme is also developed using the multi-agent system theory to ensure that collaborative UAVs can efficiently reach target spatial positions for mission execution. The case study demonstrates that the proposed UAV formation control strategy for static UEs could efficiently shape the geometric pattern related to our CoMP transmission scheme.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116234556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On The Complexity of Non-Coherent Acquisition of Chirp Spread Spectrum Signals","authors":"D. Egea, J. López-Salcedo, G. Seco-Granados","doi":"10.1109/ICASSPW59220.2023.10193476","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193476","url":null,"abstract":"During the past few decades, the use of GNSSs has become the primary and sometimes only way of providing a positioning solution for many outdoor applications. Furthermore, GNSS is playing an important role on the development of smart cities and Internet of things (IoT) applications. Unfortunately, GNSS is a technology that is a very hungry technology thus challenging its adoption in many IoT application. All these ingredients boil down to the need for alternative positioning solutions to backup GNSS. The use of low-Earth orbit (LEO) satellite constellations has been considered in the literature for that purpose. Chirp Spread Spectrum (CSS) modulation is a different approach to classic GNSS to enable positioning with LEO satellites. This type of signal is intended to address low-complexity positioning for IoT devices, tackling the complexity issue of the classic GNSS acquisition. In this paper, we consider the analysis of the non-coherent acquisition of CSS signals and its complexity is compared to its coherent counterpart.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116431661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resource-Efficient Federated Clustering with Past Negatives Pool","authors":"Runxuan Miao, Erdem Koyuncu","doi":"10.1109/ICASSPW59220.2023.10193449","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193449","url":null,"abstract":"Federated learning (FL) provides a global model over data distributed to multiple clients. However, most recent work on FL focuses on supervised learning, and a fully unsupervised federated clustering scheme has remained an open problem. In this context, Contrastive learning (CL) trains distinguishable instance embeddings without labels. However, most CL techniques are restricted to centralized data. In this work, we consider the problem of clustering data that is distributed to multiple clients using FL and CL. We propose a federated clustering framework with a novel past negatives pool (PNP) for intelligently selecting positive and negative samples for CL. PNP benefits FL and CL simultaneously, specifically, alleviating class collision for CL and reducing client-drift in FL. PNP thus provides a higher accuracy for a given constraint on the communication rounds, which makes it suitable for networks with limited communication and computation resources. Numerical results show that the resulting FedPNP scheme achieves superior performance in solving federated clustering problems on benchmark datasets including CIFAR-10 and CIFAR-100, especially in non-iid settings.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116719957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Pose-Invariant Audio-Visual Speech Enhancement in the Wild for Next-Generation Multi-Modal Hearing Aids","authors":"M. Gogate, K. Dashtipour, Amir Hussain","doi":"10.1109/ICASSPW59220.2023.10192961","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10192961","url":null,"abstract":"Classical audio-visual (AV) speech enhancement (SE) and separation methods have been successful at operating under constrained environments; however, the speech quality and intelligibility improvement is significantly reduced in unconstrained real-world environments where variation in pose and illumination are encountered. In this paper, we present a novel privacy-preserving approach for real world unconstrained pose-invariant AV SE and separation that contextually exploits pose-invariant 3D landmark flow features and noisy speech features to selectively suppress unwanted background speech and non-speech noises. In addition, we present a unified architecture that integrates state-of-the-art transformers with temporal convolution neural networks for effective pose-invariant AV SE. The preliminary systematic experimentation on benchmark multi-pose OuluVS2 and LRS3-TED corpora demonstrate that the privacy preserving 3D landmark flow features are effective for pose-invariant SE and separation. In addition, the proposed AV SE model significantly outperforms state-of-the-art audio-only SE model, oracle ideal binary mask, and A-only variant of the proposed model in speaker and noise independent settings.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124619564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guillaume Lachaud, Patricia Conde Céspedes, M. Trocan
{"title":"Scalable Missing Data Imputation With Graph Neural Networks","authors":"Guillaume Lachaud, Patricia Conde Céspedes, M. Trocan","doi":"10.1109/ICASSPW59220.2023.10193535","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193535","url":null,"abstract":"Missing features in tabular and graph structured data are common: a company may not want to disclose all of their accounting, and users online do not always engage in social platforms in the same way as their peers. Recently, models such as the GRAPE architecture have achieved state-of-the-art results in the task of feature imputation. We present an extension of GRAPE that performs mini-batch learning on datasets which do not fit in the GPU. Moreover, we add preprocessing and post-processing steps that allow the model to be used with graph structured data. We experimentally show the behaviour of the model on an academic citation network under different regimes of missing data. We observe that the model performance starts decreasing when we have less than 1% of observed edges. We additionally perform an ablation study of key elements of the model, such as its capacity, the batch size and the number of layers.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124720275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}