Heng Huang;Lin Zhao;Haixing Dai;Lu Zhang;Xintao Hu;Dajiang Zhu;Tianming Liu
{"title":"BI-AVAN: A Brain-Inspired Adversarial Visual Attention Network for Characterizing Human Visual Attention From Neural Activity","authors":"Heng Huang;Lin Zhao;Haixing Dai;Lu Zhang;Xintao Hu;Dajiang Zhu;Tianming Liu","doi":"10.1109/TMM.2024.3443623","DOIUrl":"10.1109/TMM.2024.3443623","url":null,"abstract":"Visual attention is a fundamental mechanism in the human brain, and it inspires the design of attention mechanisms in deep neural networks. However, most of the visual attention studies adopted eye-tracking data rather than the direct measurement of brain activity to characterize human visual attention. In addition, the adversarial relationship between the attention-related objects and attention-neglected background in the human visual system was not fully exploited. To bridge these gaps, we propose a novel brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity. Our BI-AVAN model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner. We use independent eye-tracking data as ground truth for validation and experimental results show that our model achieves robust and promising results when inferring meaningful human visual attention and mapping the relationship between brain activities and visual stimuli. Our BI-AVAN model contributes to the emerging field of leveraging the brain's functional architecture to inspire and guide the model design in artificial intelligence (AI), e.g., deep neural networks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11191-11203"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Uncertainty Calibration for Federated Time Series Analysis","authors":"Chao Cai;Weide Liu;Xue Xia;Zhenghua Chen;Yuming Fang","doi":"10.1109/TMM.2024.3443627","DOIUrl":"10.1109/TMM.2024.3443627","url":null,"abstract":"Deep learning models for time series analysis often require large-scale labeled datasets for training. However, acquiring such datasets is cost-intensive and challenging, particularly for individual institutions. To overcome this challenge and concern about data confidentiality among different institutions, federated learning (FL) servers as a viable solution to this dilemma by offering a decentralized learning framework. However, the datasets collected by each institution often suffer from imbalance and may not adhere to uniform protocols, leading to diverse data distributions. To address this problem, we design a global model to approximate the global data distribution of all participant clients, then transfer it to local clients as an induction in the training phase. While discrepancies between the approximate distribution and the actual distribution result in uncertainty in the predicted results. Moreover, the diverse data distributions among various clients within the FL framework, combined with the inherent lack of reliability and interpretability in deep learning models, further amplify the uncertainty of the prediction results. To address these issues, we propose an uncertainty calibration method based on Bayesian deep learning techniques, which captures uncertainty by learning a fidelity transformation to reconstruct the output of time series regression and classification tasks, utilizing deterministic pre-trained models. Extensive experiments on the regression dataset (C-MAPSS) and classification datasets (ESR, Sleep-EDF, HAR, and FD) in the Independent and Identically Distributed (IID) and non-IID settings show that our approach effectively calibrates uncertainty within the FL framework and facilitates better generalization performance in both the regression and classification tasks, achieving state-of-the-art performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11151-11163"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mao Cui;Yun Zhang;Chunling Fan;Raouf Hamzaoui;Qinglan Li
{"title":"Colored Point Cloud Quality Assessment Using Complementary Features in 3D and 2D Spaces","authors":"Mao Cui;Yun Zhang;Chunling Fan;Raouf Hamzaoui;Qinglan Li","doi":"10.1109/TMM.2024.3443634","DOIUrl":"10.1109/TMM.2024.3443634","url":null,"abstract":"Point Cloud Quality Assessment (PCQA) plays an essential role in optimizing point cloud acquisition, encoding, transmission, and rendering for human-centric visual media applications. In this paper, we propose an objective PCQA model using Complementary Features from 3D and 2D spaces, called CF-PCQA, to measure the visual quality of colored point clouds. First, we develop four effective features in 3D space to represent the perceptual properties of colored point clouds, which include curvature, kurtosis, luminance distance and hue features of points in 3D space. Second, we project the 3D point cloud onto 2D planes using patch projection and extract a structural similarity feature of the projected 2D images in the spatial domain, as well as a sub-band similarity feature in the wavelet domain. Finally, we propose a feature selection and a learning model to fuse high dimensional features and predict the visual quality of the colored point clouds. Extensive experimental results show that the Pearson Linear Correlation Coefficients (PLCCs) of the proposed CF-PCQA were 0.9117, 0.9005, 0.9340 and 0.9826 on the SIAT-PCQD, SJTU-PCQA, WPC2.0 and ICIP2020 datasets, respectively. Moreover, statistical significance tests demonstrate that the CF-PCQA significantly outperforms the state-of-the-art PCQA benchmark schemes on the four datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11111-11125"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre Aksenov, Malo Renaud-D'Ambra, Vitaly Volpert, Anne Beuter
{"title":"Phase-shifted tACS can modulate cortical alpha waves in human subjects.","authors":"Alexandre Aksenov, Malo Renaud-D'Ambra, Vitaly Volpert, Anne Beuter","doi":"10.1007/s11571-023-09997-1","DOIUrl":"10.1007/s11571-023-09997-1","url":null,"abstract":"<p><p>In the present study, we investigated traveling waves induced by transcranial alternating current stimulation in the alpha frequency band of healthy subjects. Electroencephalographic data were recorded in 12 healthy subjects before, during, and after phase-shifted stimulation with a device combining both electroencephalographic and stimulation capacities. In addition, we analyzed the results of numerical simulations and compared them to the results of identical analysis on real EEG data. The results of numerical simulations indicate that imposed transcranial alternating current stimulation induces a rotating electric field. The direction of waves induced by stimulation was observed more often during at least 30 s after the end of stimulation, demonstrating the presence of aftereffects of the stimulation. Results suggest that the proposed approach could be used to modulate the interaction between distant areas of the cortex. Non-invasive transcranial alternating current stimulation can be used to facilitate the propagation of circulating waves at a particular frequency and in a controlled direction. The results presented open new opportunities for developing innovative and personalized transcranial alternating current stimulation protocols to treat various neurological disorders.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11571-023-09997-1.</p>","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"24 1","pages":"1575-1592"},"PeriodicalIF":3.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11297852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52867081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wengang Zhou;Jiajun Deng;Niculae Sebe;Qi Tian;Alan L. Yuille;Concetto Spampinato;Zakia Hammal
{"title":"Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding","authors":"Wengang Zhou;Jiajun Deng;Niculae Sebe;Qi Tian;Alan L. Yuille;Concetto Spampinato;Zakia Hammal","doi":"10.1109/TMM.2024.3384680","DOIUrl":"10.1109/TMM.2024.3384680","url":null,"abstract":"In the ever-evolving domain of multimedia, the significance of multi-modality understanding cannot be overstated. As multimedia content becomes increasingly sophisticated and ubiquitous, the ability to effectively combine and analyze the diverse information from different types of data, such as text, audio, image, video and point clouds, will be paramount in pushing the boundaries of what technology can achieve in understanding and interacting with the world around us. Accordingly, multi-modality understanding has attracted a tremendous amount of research, establishing itself as an emerging topic. Pre-trained models, in particular, have revolutionized this field, providing a way to leverage vast amounts of data without task-specific annotation to facilitate various downstream tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"8291-8296"},"PeriodicalIF":8.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10616245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ran Ran;Jiwei Wei;Chaoning Zhang;Guoqing Wang;Yang Yang;Heng Tao Shen
{"title":"Adaptive Multi-scale Degradation-Based Attack for Boosting the Adversarial Transferability","authors":"Ran Ran;Jiwei Wei;Chaoning Zhang;Guoqing Wang;Yang Yang;Heng Tao Shen","doi":"10.1109/TMM.2024.3428311","DOIUrl":"10.1109/TMM.2024.3428311","url":null,"abstract":"The vulnerability of deep neural networks to adversarial examples has raised huge concerns about the security of these algorithms. Black-box adversarial attacks have received a lot of attention as an influential method for evaluating model robustness. While various sophisticated adversarial attack methods have been proposed, the success rate in the black-box scenario still needs to be improved. To address these issues, we develop an Adaptive Multi-scale Degradation-based Attack method called \u0000<bold>AMDA</b>\u0000. The intuitive motivation behind our approach is that different models tend to have similar attention regions for low-scale images. Specifically, AMDA uses degraded images to generate perturbations at different scales and fuses these perturbations to generate adversarial examples that are insensitive to model changes. Furthermore, we design an adaptive multi-scale perturbation fusion that evaluates the transferability of perturbations at different scales based on noise and adaptively allocates fusion weights to prioritize strong transferability attacks and avoid being compromised by local optima. Extensive experimental results on the ImageNet, CIFAR-100, and CIFAR-10 datasets demonstrate that the proposed AMDA algorithm exhibits competitive performance for both normally trained models and defense models.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10979-10990"},"PeriodicalIF":8.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141778393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xun Jiang;Xing Xu;Zailei Zhou;Yang Yang;Fumin Shen;Heng Tao Shen
{"title":"Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings","authors":"Xun Jiang;Xing Xu;Zailei Zhou;Yang Yang;Fumin Shen;Heng Tao Shen","doi":"10.1109/TMM.2024.3396272","DOIUrl":"10.1109/TMM.2024.3396272","url":null,"abstract":"Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal annotations for each target event. However, the subjectivity and time-consuming nature of the labeling process limit their practicality in multimedia applications. To address this issue, recently researchers proposed a Zero-Shot Learning setting for VMR (ZS-VMR) that trains VMR models without manual supervision signals, thereby reducing the data cost. In this paper, we tackle the challenging ZS-VMR problem with \u0000<italic>Angular Reconstructive Text embeddings (ART)</i>\u0000, generalizing the image-text matching pre-trained model CLIP to the VMR task. Specifically, assuming that visual embeddings are close to their semantically related text embeddings in angular space, our ART method generates pseudo-text embeddings of video event proposals through the hypersphere of CLIP. Moreover, to address the temporal nature of videos, we also design local multimodal fusion learning to narrow the gaps between image-text matching and video-text matching. Our experimental results on two widely used VMR benchmarks, Charades-STA and ActivityNet-Captions, show that our method outperforms current state-of-the-art ZS-VMR methods. It also achieves competitive performance compared to recent weakly-supervised VMR methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"9657-9670"},"PeriodicalIF":8.4,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prototype-Decomposed Knowledge Distillation for Learning Generalized Federated Representation","authors":"Aming Wu;Jiaping Yu;Yuxuan Wang;Cheng Deng","doi":"10.1109/TMM.2024.3428352","DOIUrl":"10.1109/TMM.2024.3428352","url":null,"abstract":"Federated learning (FL) enables distributed clients to collaboratively learn a global model, suggesting its potential for use in improving data privacy in machine learning. However, although FL has made many advances, its performance usually suffers from degradation due to the impact of domain shift when the trained models are applied to unseen domains. To enhance the model's generalization ability, we focus on solving federated domain generalization, which aims to properly generalize a federated model trained based on multiple source domains belonging to different distributions to an unseen target domain. A novel approach, namely Prototype-Decomposed Knowledge Distillation (PDKD), is proposed herein. Concretely, we first aggregate the local class prototypes that are learned from different clients. Subsequently, Singular Value Decomposition (SVD) is employed to decompose the local prototypes to obtain discriminative and generalized global prototypes that contain rich category-related information. Finally, the global prototypes are sent back to all clients. We exploit knowledge distillation to encourage local client models to distill generalized knowledge from the global prototypes, which boosts the generalization ability. Extensive experiments on multiple datasets demonstrate the effectiveness of our method. In particular, when implemented on the Office dataset, our method outperforms FedAvg by around 13.5%, which shows that our method is instrumental in ameliorating the generalization ability of federated models.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10991-11002"},"PeriodicalIF":8.4,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CenterFormer: A Novel Cluster Center Enhanced Transformer for Unconstrained Dental Plaque Segmentation","authors":"Wenfeng Song;Xuan Wang;Yuting Guo;Shuai Li;Bin Xia;Aimin Hao","doi":"10.1109/TMM.2024.3428349","DOIUrl":"10.1109/TMM.2024.3428349","url":null,"abstract":"Dental plaque segmentation is crucial for maintaining oral health. However, accurately segmenting dental plaque in unconstrained environments can be challenging due to its low contrast and high variability in appearance. While existing transformer-based networks rely on attention mechanisms for each pixel, they do not take into account the relationships between neighboring pixels. Consequently, feature extraction is limited, making it difficult to achieve accurate segmentation of low-contrast images. To address this issue, we propose a simple yet efficient cluster center transformer that improves dental plaque segmentation by clustering image pixels based on multiple levels of feature maps' intensity and texture information. By grouping similar pixels into regions, the proposed method enables the transformers to focus on the local contour and edge around the teeth regions, adapting to the low contrast and high variability of plaque appearance, leading to more accurate and efficient segmentation of dental plaque in dental images. Additionally, we designed Multiple Granularity Perceptions using a pyramid fusion mechanism to capture multiple scales of vision features, thereby enhancing the low-contrast vision features. The proposed method can benefit the dental diagnosis and treatment planning process by improving the accuracy and efficiency of dental plaque segmentation. Our proposed method achieved state-of-the-art results on the dental plaque dataset (Li et al., 2020), with intersection over union (IoU) of 60.91% and pixel accuracy (PA) of 76.81%, all of which were the highest among all methods, demonstrating its effectiveness in plaque segmentation in unconstrained environments.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10965-10978"},"PeriodicalIF":8.4,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}