2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献_第10页

RF-based Multi-view Pose Machine for Multi-Person 3D Pose Estimation 基于射频的多人三维姿态估计多视角姿态机

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00454

Chunyang Xie, Dongheng Zhang, Zhi Wu, Cong Yu, Yang Hu, Qibin Sun, Yan Chen

{"title":"RF-based Multi-view Pose Machine for Multi-Person 3D Pose Estimation","authors":"Chunyang Xie, Dongheng Zhang, Zhi Wu, Cong Yu, Yang Hu, Qibin Sun, Yan Chen","doi":"10.1109/ICME55011.2023.00454","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00454","url":null,"abstract":"In this paper, we present RF-based Multi-view Pose machine (RF-MvP) for multi-person 3D pose estimation using RF signals. Specifically, we first develop a lightweight anchor-free detector module to locate and crop regions of interest from horizontal and vertical RF signals. Afterward, we propose a Multi-view Fusion Network to unproject the RF signals from the horizontal and vertical millimeter-wave radars into a unified latent space, and then calculate the correlation for weighted fusion. Finally, a Spatio-Temporal Attention Network is designed to reconstruct the multi-person 3D skeleton sequences, in which the spatial attention module is proposed to recover invisible body parts using non-local correlations among joints and the temporal attention module refines the 3D pose sequences using temporal coherency learned from frame queries. We evaluate the performance of the proposed RF-MvP and state-of-the-art methods on a large-scale dataset with multi-person 3D pose labels and corresponding radar signals. The experimental results show that RF-MvP outperforms all of the baseline methods, which locates multi-person 3D key points with an average error of 73mm and generalizes well in new data such as occlusion, low illumination.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115106191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Scale Hybrid Fusion Network for Mandarin Audio-Visual Speech Recognition 普通话视听语音识别的多尺度混合融合网络

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00116

Jinxin Wang, Zhongwen Guo, Chao Yang, Xiaomei Li, Ziyuan Cui

{"title":"Multi-Scale Hybrid Fusion Network for Mandarin Audio-Visual Speech Recognition","authors":"Jinxin Wang, Zhongwen Guo, Chao Yang, Xiaomei Li, Ziyuan Cui","doi":"10.1109/ICME55011.2023.00116","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00116","url":null,"abstract":"Compared to feature or decision fusion, hybrid fusion can beneficially improve audio-visual speech recognition accuracy. Existing works are mainly prone to design the multi-modality feature extraction process, interaction, and prediction, neglecting useful information on the multi-modality and the optimal combination of different predicted results. In this paper, we propose a multi-scale hybrid fusion network (MSHF) for mandarin audio-visual speech recognition. Our MSHF consists of a feature extraction subnetwork to exploit the proposed multi-scale feature extraction module (MSFE) to obtain multi-scale features and a hybrid fusion subnetwork to integrate the intrinsic correlation of different modality information, optimizing the weights of prediction results for different modalities to achieve the best classification. We further design a feature recognition module (FRM) for accurate audio-visual speech recognition. We conducted experiments on the CAS-VSR-W1k dataset. The experimental results show that the proposed method outperforms the selected competitive baselines and the state-of-the-art, indicating the superiority of our proposed modules.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115390057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised Fashion Style Learning by Solving Fashion Jigsaw Puzzles 通过解决时尚拼图学习无监督的时尚风格

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00317

Jia Chen, Haidongqing Yuan, Fei Fang, Tao Peng, X. Hu

引用次数: 0

Edge-Aware Mirror Network for Camouflaged Object Detection 用于伪装目标检测的边缘感知镜像网络

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00420

Dongyue Sun, Shiyao Jiang, Lin Qi

{"title":"Edge-Aware Mirror Network for Camouflaged Object Detection","authors":"Dongyue Sun, Shiyao Jiang, Lin Qi","doi":"10.1109/ICME55011.2023.00420","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00420","url":null,"abstract":"Existing edge-aware camouflaged object detection (COD) methods normally output the edge prediction in the early stage. However, edges are important and fundamental factors in the following segmentation task. Due to the high visual similarity between camouflaged targets and the surroundings, edge prior predicted in early stage usually introduces erroneous foreground-background and contaminates features for segmentation. To tackle this problem, we propose a novel Edge-aware Mirror Network (EAMNet), which models edge detection and camouflaged object segmentation as a cross refinement process. More specifically, EAMNet has a two-branch architecture, where a segmentation-induced edge aggregation module and an edge- induced integrity aggregation module are designed to cross-guide the segmentation branch and edge detection branch. A guided-residual channel attention module which leverages the residual connection and gated convolution finally better extracts structural details from low-level features. Quantitative and qualitative experiment results show that EAMNet outperforms existing cutting-edge baselines on three widely used COD datasets. Codes are available at https://github.com/sdy1999/EAMNet.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115482474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

2S-DFN: Dual-semantic Decoding Fusion Networks for Fine-grained Image Recognition 用于细粒度图像识别的双语义解码融合网络

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/icme55011.2023.00012

Pufen Zhang, Peng Shi, Song Zhang

引用次数: 0

Early Diagnosis of Alzheimer’s Disease Based on Multimodal Hypergraph Attention Network 基于多模态超图注意网络的阿尔茨海默病早期诊断

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00041

Yi Li, Baoyao Yang, Dan Pan, An Zeng, Long Wu, Yang Yang

引用次数: 0

Be-or-Not Prompt Enhanced Hard Negatives Generating For Memes Category Detection 基于模因类别检测的“是或不是”提示增强硬否定生成

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00038

Jian Cui, Lin Li, Xiaohui Tao

引用次数: 0

SQT: Debiased Visual Question Answering via Shuffling Question Types 通过变换问题类型来消除视觉问题回答的偏见

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00109

Tianyu Huai, Shuwen Yang, Junhang Zhang, Guoan Wang, Xinru Yu, Tianlong Ma, Liang He

引用次数: 0

ATENet: Adaptive Tiny-Object Enhanced Network for Polyp Segmentation ATENet:用于息肉分割的自适应小目标增强网络

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00389

Xiaogang Du, Yinghao Wu, Tao Lei, Dongxin Gu, Yinyin Nie, A. Nandi

引用次数: 0

ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting ADATS:基于自适应RoI-Align的端到端文本定位转换器

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00243

Zepeng Huang, Qi Wan, Junliang Chen, Xiaodong Zhao, Kai Ye, Linlin Shen

引用次数: 0