2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献

Boosting Interactive Image Segmentation by Exploiting Semantic Clues 利用语义线索增强交互式图像分割

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00026

Qiaoqiao Wei, Hui Zhang, J. Yong

引用次数: 0

Learning a Multilevel Cooperative View Reconstruction Network for Light Field Angular Super-Resolution 基于光场角超分辨率的多层协同视图重建网络研究

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00221

Deyang Liu, Yifan Mao, Xiaofei Zhou, P. An, Yuming Fang

{"title":"Learning a Multilevel Cooperative View Reconstruction Network for Light Field Angular Super-Resolution","authors":"Deyang Liu, Yifan Mao, Xiaofei Zhou, P. An, Yuming Fang","doi":"10.1109/ICME55011.2023.00221","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00221","url":null,"abstract":"Recently, many methods have been proposed to improve the angular resolution of sparsely-sampled Light Field (LF). However, the synthesized dense LF inevitably exhibits blurry edges and artifacts. This paper intents to model the global relations of LF views and quality degradation model by learning a multilevel cooperative view reconstruction network to further enhance LF angular Super-Resolution (SR) performance. The proposed LF angular SR network consists of three sub-networks including the Cooperative Angular Transformer Network (CATNet), the Deblurring Network (DBNet), and the Texture Repair Network (TRNet). The CATNet simultaneously captures global features of all LF views and local features within each view, which benefits in characterizing the inherent LF structure. The DBNet models a quality degradation model by estimating blur kernels to reduce the blurry edges and artifacts. The TRNet focuses on restoring fine-scale texture details. Experimental results over various LF datasets including large baseline LF images demonstrate the significant superiority of our method when compared with state-of-the-art ones.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115700681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Guidelines for Subjective Haptic Quality Assessment: A Case Study on Quality Assessment of Compressed Haptic Signals 主观触觉质量评价指南:以压缩触觉信号质量评价为例

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00287

Andréas Pastor, P. Callet

{"title":"Towards Guidelines for Subjective Haptic Quality Assessment: A Case Study on Quality Assessment of Compressed Haptic Signals","authors":"Andréas Pastor, P. Callet","doi":"10.1109/ICME55011.2023.00287","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00287","url":null,"abstract":"Modern systems are multimodal (e.g., video, audio, smell), and haptic feedback provides the user with additional entertainment and sensory immersion. Standard recommendation groups extensively studied and focused on video and audio subjective quality assessment, especially in signal transmission. In that context, subjective quality assessment and Quality of Experience (QoE) of Haptic signals is at its infant age. We propose further analyzing the collected data from a recent subjective quality assessment campaign as part of the MPEG haptic standardization group. In particular, we are addressing the following questions: 1) How the emerging field of haptic signal QoE can benefit from existing efforts of video and audio quality assessment standards? 2) How to detect possible outliers or characterize the rater’s reliability? 3) How does the discriminability of haptic tests increases with the number of raters? Towards this goal, we question if traditional analysis as proposed for audio or video signal are suitable, as well as other state-of-the-art techniques. We also compare the discriminability of the haptics quality assessment tests with other modalities such as audio, video, and immersive content (360° contents). We propose recommendations on the number of raters required to meet the usual discriminability obtained for other perceptual modalities and how to process ratings to remove possible noise and biases. These results could feed future recommendations in standards such as BT500-14 or P.913 but for haptic signals.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121796779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Dense-Sparse Representations for Real-Time Question Answering 实时问答的动态密集-稀疏表示

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00250

Minyu Sun, Bin Jiang, Chao Yang

引用次数: 0

Designing Optics and Algorithm for Ultra-Thin, High-Speed Lensless Cameras 超薄高速无镜头相机的光学与算法设计

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00273

Salman Siddique Khan, V. Boominathan, A. Veeraraghavan, K. Mitra

{"title":"Designing Optics and Algorithm for Ultra-Thin, High-Speed Lensless Cameras","authors":"Salman Siddique Khan, V. Boominathan, A. Veeraraghavan, K. Mitra","doi":"10.1109/ICME55011.2023.00273","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00273","url":null,"abstract":"There is a growing demand for small, light-weight and low-latency cameras in the robotics and AR/VR community. Mask-based lensless cameras, by design, provide a combined advantage of form-factor, weight and speed. They do so by replacing the classical lens with a thin optical mask and computation. Recent works have explored deep learning based post-processing operations on lensless captures that allow high quality scene reconstruction. However, the ability of deep learning to find the optimal optics for thin lensless cameras has not been explored. In this work, we propose a learning based framework for designing the optics of thin lensless cameras. To highlight the effectiveness of our framework, we learn the optical phase mask for multiple tasks using physics-based neural networks. Specifically, we learn the optimal mask using a weighted loss defined for the following tasks-2D scene reconstructions, optical flow estimation and face detection. We show that mask learned through this framework is better than heuristically designed masks especially for small sensors sizes that allow lower bandwidth and faster readout. Finally, we verify the performance of our learned phase-mask on real data.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125931273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Camouflaged Object Detection with Feature Grafting and Distractor Aware 基于特征嫁接和干扰感知的伪装目标检测

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00419

Yuxuan Song, Xinyue Li, Lin Qi

{"title":"Camouflaged Object Detection with Feature Grafting and Distractor Aware","authors":"Yuxuan Song, Xinyue Li, Lin Qi","doi":"10.1109/ICME55011.2023.00419","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00419","url":null,"abstract":"The task of Camouflaged Object Detection (COD) aims to accurately segment camouflaged objects that integrated into the environment, which is more challenging than ordinary detection as the texture between the target and background is visually indistinguishable. In this paper, we proposed a novel Feature Grafting and Distractor Aware network (FDNet) to handle the COD task. Specifically, we use CNN and Transformer to encode multi-scale images in parallel. In order to better explore the advantages of the two encoders, we design a cross-attention-based Feature Grafting Module to graft features extracted from Transformer branch into CNN branch, after which the features are aggregated in the Feature Fusion Module. A Distractor Aware Module is designed to explicitly model the two possible distractor in the COD task to refine the coarse camouflage map. We also proposed the largest artificial camouflaged object dataset which contains 2000 images with annotations, named ACOD2K. We conducted extensive experiments on four widely used benchmark datasets and the ACOD2K dataset. The results show that our method significantly outperforms other state-of-the-art methods. The code and the ACOD2K will be available at https://github.com/syxvision/FDNet.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126044564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Attribute Knowledge for Open-set Action Recognition 利用属性知识进行开集动作识别

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00136

Kaixiang Yang, Junyu Gao, Yangbo Feng, Changsheng Xu

引用次数: 0

Rethinking Overfitting of Multiple Instance Learning for Whole Slide Image Classification 对全幻灯片图像分类多实例学习过拟合的再思考

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00100

Hongjian Song, Jie Tang, Hongzhao Xiao, Juncheng Hu

引用次数: 0

Mandari: Multi-Modal Temporal Knowledge Graph-aware Sub-graph Embedding for Next-POI Recommendation 面向Next-POI推荐的多模态时间知识图感知子图嵌入

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00264

Xiaoqian Liu, Xiuyun Li, Yuan Cao, Fan Zhang, Xiongnan Jin, Jinpeng Chen

{"title":"Mandari: Multi-Modal Temporal Knowledge Graph-aware Sub-graph Embedding for Next-POI Recommendation","authors":"Xiaoqian Liu, Xiuyun Li, Yuan Cao, Fan Zhang, Xiongnan Jin, Jinpeng Chen","doi":"10.1109/ICME55011.2023.00264","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00264","url":null,"abstract":"Next-POI recommendation aims to explore from user check-in sequence to predict the next possible location to be visited. Existing methods are often difficult to model the implicit association of multi-modal data with user choices. Moreover, traditional methods struggle to fully explore the variation of user preferences at variable time intervals. To tackle these limitations, we propose a Multi-Modal Temporal Knowledge Graph-aware Sub-graph Embedding approach (Mandari). We first construct a novel Multi-Modal Temporal Knowledge Graph. Based on the proposed knowledge graph, we integrate multi-modal information and leverage the graph attention network to calculate sub-graph prediction probability. Next, we implement a temporal knowledge mining method to model the segmentation and periodicity of user check-in and obtain temporal prediction probability. Finally, we fuse temporal prediction probability with the previous sub-graph prediction probability to obtain the final result. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125417784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Explainable Multi-view Semantic Fusion Model for Multimodal Fake News Detection 一种可解释的多视图语义融合模型用于多模态假新闻检测

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00215

Zhi Zeng, Mingmin Wu, Guodong Li, Xiang Li, Zhongqiang Huang, Ying Sha

引用次数: 0