{"title":"Boosting Interactive Image Segmentation by Exploiting Semantic Clues","authors":"Qiaoqiao Wei, Hui Zhang, J. Yong","doi":"10.1109/ICME55011.2023.00026","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00026","url":null,"abstract":"This paper presents a refinement framework for enhancing the accuracy of interactive image segmentation by exploiting all available semantic clues. Interactive image segmentation iteratively improves segmentation masks using an input image and user annotations. The information available in this process ranges from low-level visual features like colors and textures to high-level semantic information, such as user annotations and segmentation results. Despite tremendous efforts to segment the overall object shapes, existing methods underutilize the available semantic clues, causing unsatisfactory boundary quality for segmentation masks. The proposed framework first extracts confidence guidance maps, then suppresses and lifts the predicted probabilities for confident pixels, and finally utilizes color similarities as bases and prediction confidence as guidance to refine the segmentation boundaries. Experimental results demonstrate that the framework has a low computational cost and significantly boosts existing methods on standard benchmarks.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115347456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deyang Liu, Yifan Mao, Xiaofei Zhou, P. An, Yuming Fang
{"title":"Learning a Multilevel Cooperative View Reconstruction Network for Light Field Angular Super-Resolution","authors":"Deyang Liu, Yifan Mao, Xiaofei Zhou, P. An, Yuming Fang","doi":"10.1109/ICME55011.2023.00221","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00221","url":null,"abstract":"Recently, many methods have been proposed to improve the angular resolution of sparsely-sampled Light Field (LF). However, the synthesized dense LF inevitably exhibits blurry edges and artifacts. This paper intents to model the global relations of LF views and quality degradation model by learning a multilevel cooperative view reconstruction network to further enhance LF angular Super-Resolution (SR) performance. The proposed LF angular SR network consists of three sub-networks including the Cooperative Angular Transformer Network (CATNet), the Deblurring Network (DBNet), and the Texture Repair Network (TRNet). The CATNet simultaneously captures global features of all LF views and local features within each view, which benefits in characterizing the inherent LF structure. The DBNet models a quality degradation model by estimating blur kernels to reduce the blurry edges and artifacts. The TRNet focuses on restoring fine-scale texture details. Experimental results over various LF datasets including large baseline LF images demonstrate the significant superiority of our method when compared with state-of-the-art ones.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115700681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Guidelines for Subjective Haptic Quality Assessment: A Case Study on Quality Assessment of Compressed Haptic Signals","authors":"Andréas Pastor, P. Callet","doi":"10.1109/ICME55011.2023.00287","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00287","url":null,"abstract":"Modern systems are multimodal (e.g., video, audio, smell), and haptic feedback provides the user with additional entertainment and sensory immersion. Standard recommendation groups extensively studied and focused on video and audio subjective quality assessment, especially in signal transmission. In that context, subjective quality assessment and Quality of Experience (QoE) of Haptic signals is at its infant age. We propose further analyzing the collected data from a recent subjective quality assessment campaign as part of the MPEG haptic standardization group. In particular, we are addressing the following questions: 1) How the emerging field of haptic signal QoE can benefit from existing efforts of video and audio quality assessment standards? 2) How to detect possible outliers or characterize the rater’s reliability? 3) How does the discriminability of haptic tests increases with the number of raters? Towards this goal, we question if traditional analysis as proposed for audio or video signal are suitable, as well as other state-of-the-art techniques. We also compare the discriminability of the haptics quality assessment tests with other modalities such as audio, video, and immersive content (360° contents). We propose recommendations on the number of raters required to meet the usual discriminability obtained for other perceptual modalities and how to process ratings to remove possible noise and biases. These results could feed future recommendations in standards such as BT500-14 or P.913 but for haptic signals.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121796779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Dense-Sparse Representations for Real-Time Question Answering","authors":"Minyu Sun, Bin Jiang, Chao Yang","doi":"10.1109/ICME55011.2023.00250","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00250","url":null,"abstract":"Existing real-time question answering models have shown speed benefits on open-domain tasks. However, they possess limited phrase representations and are susceptible to information loss, which leads to low accuracy. In this paper, we propose modified contextualized sparse and dense encoders to improve the context embedding quality. For sparse encoding, we propose the JM-Sparse, which utilizes joint multi-head attention to focus on crucial information in different context locations and subsequently learn sparse vectors within an n-gram vocabulary space. Moreover, we leverage the similarity-enhanced dense(SE-Dense) vector to obtain rich contextual dense representations. To effectively combine dense and sparse features, we train the weights of dense and sparse vectors dynamically. Extensive experiments on standard benchmarks demonstrate the effectiveness of the proposed method compared with other query-agnostic models.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125862634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salman Siddique Khan, V. Boominathan, A. Veeraraghavan, K. Mitra
{"title":"Designing Optics and Algorithm for Ultra-Thin, High-Speed Lensless Cameras","authors":"Salman Siddique Khan, V. Boominathan, A. Veeraraghavan, K. Mitra","doi":"10.1109/ICME55011.2023.00273","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00273","url":null,"abstract":"There is a growing demand for small, light-weight and low-latency cameras in the robotics and AR/VR community. Mask-based lensless cameras, by design, provide a combined advantage of form-factor, weight and speed. They do so by replacing the classical lens with a thin optical mask and computation. Recent works have explored deep learning based post-processing operations on lensless captures that allow high quality scene reconstruction. However, the ability of deep learning to find the optimal optics for thin lensless cameras has not been explored. In this work, we propose a learning based framework for designing the optics of thin lensless cameras. To highlight the effectiveness of our framework, we learn the optical phase mask for multiple tasks using physics-based neural networks. Specifically, we learn the optimal mask using a weighted loss defined for the following tasks-2D scene reconstructions, optical flow estimation and face detection. We show that mask learned through this framework is better than heuristically designed masks especially for small sensors sizes that allow lower bandwidth and faster readout. Finally, we verify the performance of our learned phase-mask on real data.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125931273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Camouflaged Object Detection with Feature Grafting and Distractor Aware","authors":"Yuxuan Song, Xinyue Li, Lin Qi","doi":"10.1109/ICME55011.2023.00419","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00419","url":null,"abstract":"The task of Camouflaged Object Detection (COD) aims to accurately segment camouflaged objects that integrated into the environment, which is more challenging than ordinary detection as the texture between the target and background is visually indistinguishable. In this paper, we proposed a novel Feature Grafting and Distractor Aware network (FDNet) to handle the COD task. Specifically, we use CNN and Transformer to encode multi-scale images in parallel. In order to better explore the advantages of the two encoders, we design a cross-attention-based Feature Grafting Module to graft features extracted from Transformer branch into CNN branch, after which the features are aggregated in the Feature Fusion Module. A Distractor Aware Module is designed to explicitly model the two possible distractor in the COD task to refine the coarse camouflage map. We also proposed the largest artificial camouflaged object dataset which contains 2000 images with annotations, named ACOD2K. We conducted extensive experiments on four widely used benchmark datasets and the ACOD2K dataset. The results show that our method significantly outperforms other state-of-the-art methods. The code and the ACOD2K will be available at https://github.com/syxvision/FDNet.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126044564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Attribute Knowledge for Open-set Action Recognition","authors":"Kaixiang Yang, Junyu Gao, Yangbo Feng, Changsheng Xu","doi":"10.1109/ICME55011.2023.00136","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00136","url":null,"abstract":"Open-set action recognition(OSAR) aims to recognize known classes and reject unknown classes. Most OSAR methods focus on learning a favorable threshold to distinguish known and unknown samples in a pure data-driven manner. However, these methods do not utilize the prior knowledge of action classes. In this paper, we propose to Leverage Attribute Knowledge (LAK) for OSAR. Specifically, the class-attribute knowledge learning is designed to integrate attribute knowledge into the model based on spatial-temporal features. Here, attributes are used as a bridge, linking known and unknown classes implicitly to make up the knowledge gap. Furthermore, a learnable relation matrix is adaptively adjusted during training to obtain the class-attribute relations that are expected to be generalized in open-set settings. Extensive experiments on three popular datasets show that the proposed method achieves state-of-the-art performance.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123332471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongjian Song, Jie Tang, Hongzhao Xiao, Juncheng Hu
{"title":"Rethinking Overfitting of Multiple Instance Learning for Whole Slide Image Classification","authors":"Hongjian Song, Jie Tang, Hongzhao Xiao, Juncheng Hu","doi":"10.1109/ICME55011.2023.00100","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00100","url":null,"abstract":"Multiple instance learning(MIL) is widely used for whole slide image(WSI) classification. However, these methods suffer from severe overfitting. In this paper, we introduce two main causes of such overfitting problems by rethinking the MIL task and formulation of attention-based MIL models: (i) The model is sensitive to the proportion of positive regions, and (ii)incorrectly learns the positional relationship of patches (i.e., the order of instances). To this end, we propose recurrent random padding(RRP) module and patch shuffle(PS) module to tackle these two issues, respectively. Furthermore, we present random alignment(RA) algorithm to solve these two overfitting problems simultaneously. On CAMELYON16 and TCGA-NSCLC, the proposed plug-and-play modules improve the performance of six baselines by large margins. The significant and consistent refinement demonstrates the correctness of our theories and the effectiveness of our modules.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123773185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mandari: Multi-Modal Temporal Knowledge Graph-aware Sub-graph Embedding for Next-POI Recommendation","authors":"Xiaoqian Liu, Xiuyun Li, Yuan Cao, Fan Zhang, Xiongnan Jin, Jinpeng Chen","doi":"10.1109/ICME55011.2023.00264","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00264","url":null,"abstract":"Next-POI recommendation aims to explore from user check-in sequence to predict the next possible location to be visited. Existing methods are often difficult to model the implicit association of multi-modal data with user choices. Moreover, traditional methods struggle to fully explore the variation of user preferences at variable time intervals. To tackle these limitations, we propose a Multi-Modal Temporal Knowledge Graph-aware Sub-graph Embedding approach (Mandari). We first construct a novel Multi-Modal Temporal Knowledge Graph. Based on the proposed knowledge graph, we integrate multi-modal information and leverage the graph attention network to calculate sub-graph prediction probability. Next, we implement a temporal knowledge mining method to model the segmentation and periodicity of user check-in and obtain temporal prediction probability. Finally, we fuse temporal prediction probability with the previous sub-graph prediction probability to obtain the final result. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125417784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Explainable Multi-view Semantic Fusion Model for Multimodal Fake News Detection","authors":"Zhi Zeng, Mingmin Wu, Guodong Li, Xiang Li, Zhongqiang Huang, Ying Sha","doi":"10.1109/ICME55011.2023.00215","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00215","url":null,"abstract":"The existing models have been achieved great success in capturing and fusing miltimodal semantics of news. However, they paid more attention to the global information, ignoring the interactions of global and local semantics and the inconsistency between different modalities. Therefore, we propose an explainable multi-view semantic fusion model (EMSFM), where we aggregate the important inconsistent semantics from local and global views to compensate the global information. Inspired by various forms of artificial fake news and real news, we summarize four views of multimodal correlation: consistency and inconsistency in the local and global views. Integrating these four views, our EMSFM can interpretatively establish global and local fusion between consistent and inconsistent semantics in multimodal relations for fake news detection. The extensive experimental results show that the EMSFM can improve the performance of multimodal fake news detection and provide a novel paradigm for explainable multi-view semantic fusion.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125525710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}