2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献

筛选
英文 中文
ERPG: Enhancing Entity Representations with Prompt Guidance for Complex Named Entity Recognition ERPG:为复杂命名实体识别增强实体表示与提示指导
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00478
Xingyu Zhu, Feifei Dai, Xiaoyan Gu, Haihui Fan, B. Li, Weiping Wang
{"title":"ERPG: Enhancing Entity Representations with Prompt Guidance for Complex Named Entity Recognition","authors":"Xingyu Zhu, Feifei Dai, Xiaoyan Gu, Haihui Fan, B. Li, Weiping Wang","doi":"10.1109/ICME55011.2023.00478","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00478","url":null,"abstract":"Recently, sequence generation methods are widely used in complex named entity recognition. By selecting high-related tokens to generate complex named entities, these methods obtain several achievements. However, due to lack of guidance in learning output format and ignoring labels in obtaining features, sequence generation methods suffer invalid output and inaccurate recognition. To solve that, we propose an Enhancing Entity Representation method with Prompt Guidance (ERPG). Specifically, in order to reduce invalid output, we design the candidate entity generation module that generate candidate entities and their labels as expected. Besides, to accurately recognize candidate entities, we propose candidate entity refine module, which obtain distinguishable candidate entity representations and filter them accurately. Based on that, our method finally outperforms baselines by 1.20, 1.62 and 0.69 F1 scores in ACE2004, GENIA and CADEC corpora, which proves the effectiveness in complex named entity recognition.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124370876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variational Information Bottleneck for Cross Domain Object Detection 跨域目标检测的变分信息瓶颈
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00381
Jiangming Chen, Wanxia Deng, Bo Peng, Tianpeng Liu, Yingmei Wei, Li Liu
{"title":"Variational Information Bottleneck for Cross Domain Object Detection","authors":"Jiangming Chen, Wanxia Deng, Bo Peng, Tianpeng Liu, Yingmei Wei, Li Liu","doi":"10.1109/ICME55011.2023.00381","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00381","url":null,"abstract":"Cross domain object detection leverages a labeled source domain to learn an object detector which performs well in a novel unlabeled target domain. Most existing works mainly align the distribution utilizing the entire image knowledge ignoring the obstacles of task-uncorrelated information to alleviate the domain discrepancy. To tackle this issue, we propose a novel module called Variational Instance Disentanglement (VID) based on information theory which aims to decouple the information of task-correlated while filtering out the task-uncorrelated factors at the instance level. Notably, the proposed VID can be used as a plug-and-play module without bringing extra network parameter cost. We equip it with adversarial network and self-training network forming Variational Instance Disentanglement Adversarial Network (VIDAN) and Variational Instance Disentanglement Self-training Network (VIDSN), respectively. Extensive experiments on multiple widely-used scenarios show that the proposed method improves the performance of the popular frameworks and outperforms state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114566290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Compressed Sensing Using Multi-Scale Characteristic Residual Learning 基于多尺度特征残差学习的图像压缩感知
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00275
Shumian Yang, Xinxin Xiang, Fenghua Tong, Dawei Zhao, Xin Li
{"title":"Image Compressed Sensing Using Multi-Scale Characteristic Residual Learning","authors":"Shumian Yang, Xinxin Xiang, Fenghua Tong, Dawei Zhao, Xin Li","doi":"10.1109/ICME55011.2023.00275","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00275","url":null,"abstract":"Deep network-based image compressed sensing (CS) methods have attracted much attention in recent years due to their low reconstruction complexity and high reconstruction quality. However, the existing methods usually use one or multiple convolution layer(s) consisting of convolutional kernels with the same size to extract image features in image sampling, which results in incomplete feature extraction. Besides, the existing models usually focus on the extraction of deep features in image reconstruction, while ignoring the influence of shallow features. To overcome these issues, this paper proposes a multi-scale characteristic residual learning network (dubbed MSCRLNet) for image CS. In this network, convolutional kernels with different sizes are used to capture multi-level spatial features in image sampling, and a multi-scale residual network with channel attention is used to speed up network convergence in image reconstruction. Experiments show that the proposed MSCRLNet outperforms many existing state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114849609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning High Frequency Surface Functions In Shells 在壳中学习高频表面函数
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00112
Han Guo, Yuanlong Yu, Yujie Wang, Xuelin Chen, Yixin Zhuang
{"title":"Learning High Frequency Surface Functions In Shells","authors":"Han Guo, Yuanlong Yu, Yujie Wang, Xuelin Chen, Yixin Zhuang","doi":"10.1109/ICME55011.2023.00112","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00112","url":null,"abstract":"Recently, coordinate-based MLPs have been shown to be powerful representations for 3D surfaces, where learning high-frequency details is facilitated by modulating surface functions with periodic functions [1], [2]. While shortening the periodicity helps in learning high frequencies, it leads to increasing ambiguity, i.e., more points along the axis directions become similar in the embedded space, so that many points on the surface and outside the surface have similar predictions. In addition, short periodicity increases local geometric variations, leading to unexpected noisy artifacts in untrained regions. Unlike existing methods that learn surface functions in a regular cube, we find surfaces within shells, a coarse form of the target surfaces constructed by a binary classifier. The advantage of build surfaces in shells is that MLPs focus on regions of interest, which inherently reduces ambiguity and also promotes training efficiency and test accuracy. We demonstrate the effectiveness of shells and show significant improvements over baseline methods in 3D surface reconstruction from raw point clouds.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115125676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSP-Net: Diverse Structure Prior Network for Image Inpainting DSP-Net:用于图像绘制的多结构先验网络
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00088
Lin Sun, Chao Yang, Bin Jiang
{"title":"DSP-Net: Diverse Structure Prior Network for Image Inpainting","authors":"Lin Sun, Chao Yang, Bin Jiang","doi":"10.1109/ICME55011.2023.00088","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00088","url":null,"abstract":"The latest deep learning-based approaches have advanced diverse image inpainting task. However, existing methods limit to be aware of the structure information well, which constricts the performance of diverse generations. The intuitive representation of diversity generation is the structure change since the structure is the basis of the image. In this paper, we make full use of the structure information and propose the diverse structure prior network (DSP-Net). Specifically, there are two stages in DSP-Net to generate the diverse structure first and refine the texture next. For the diverse structure generation, we prompt the structural distribution to be similar to the Gaussian distribution to sample the diverse structural prior. With these priors, we refine the texture with a proposed propagation attention module. Meanwhile, we propose a structure diversity loss to enhance the ability of diverse structure generation further. Experiments on benchmark datasets including CelebA-HQ and Places2 indicate that DSP-Net is effective for diverse and visually realistic image restoration.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114733119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modality-Fusion Spiking Transformer Network for Audio-Visual Zero-Shot Learning 用于视听零射击学习的模态融合尖峰变压器网络
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00080
Wenrui Li, Zhengyu Ma, Liang-Jian Deng, Hengyu Man, Xiaopeng Fan
{"title":"Modality-Fusion Spiking Transformer Network for Audio-Visual Zero-Shot Learning","authors":"Wenrui Li, Zhengyu Ma, Liang-Jian Deng, Hengyu Man, Xiaopeng Fan","doi":"10.1109/ICME55011.2023.00080","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00080","url":null,"abstract":"Audio-visual zero-shot learning (ZSL), which learns to classify video data from the classes not being observed during training, is challenging. In audio-visual ZSL, both semantic and temporal information from different modalities is relevant to each other. However, effectively extracting and fusing information from audio and visual remains an open challenge. In this work, we propose an Audio-Visual Modality-fusion Spiking Transformer network (AVMST) for audio-visual ZSL. To be more specific, AVMST provides a spiking neural network (SNN) module for extracting conspicuous temporal information of each modality, a cross-attention block to effectively fuse the temporal and semantic information, and a transformer reasoning module to further explore the interrelationships of fusion features. To provide robust temporal features, the spiking threshold of the SNN module is adjusted dynamically based on the semantic cues of different modalities. The generated feature map is in accordance with the zero-shot learning property thanks to our proposed spiking transformer’s ability to combine the robustness of SNN feature extraction and the precision of transformer feature inference. Extensive experiments on three benchmark audio-visual datasets (i.e., VGGSound, UCF and ActivityNet) validate that the proposed AVMST outperforms existing state-of-the-art methods by a significant margin. The code and pre-trained models are available at https://github.com/liwr-hit/ICME23_AVMST.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116985422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RASNet: A Reinforcement Assistant Network for Frame Selection in Video-based Posture Recognition RASNet:基于视频的姿态识别中帧选择的强化辅助网络
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00366
Ruotong Hu, Xianzhi Wang, Xiaojun Chang, Yeqi Hu, Xiaowei Xin, Xiangqian Ding, Baoqi Guo
{"title":"RASNet: A Reinforcement Assistant Network for Frame Selection in Video-based Posture Recognition","authors":"Ruotong Hu, Xianzhi Wang, Xiaojun Chang, Yeqi Hu, Xiaowei Xin, Xiangqian Ding, Baoqi Guo","doi":"10.1109/ICME55011.2023.00366","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00366","url":null,"abstract":"Most existing video-based posture recognition methods treat frames equally using unified or random sampling strategies, thus losing the temporal relationship information among frames. To address this problem, we propose a lightweight framework, namely RASNet, to adaptively select informative frames for recognition. Specifically, we design a video-suited exploration environment to guide the agent in learning the selection strategy. We introduce the reparametrization method to convert the discrete action space into a continuous space, making the agent robust and random. For the reward part, we design a multi-factor function to reward the agent keeping a balance between frame usage and accuracy. Extensive experiments on three large-scale datasets prove the effectiveness of RASNet, e.g., achieving 85.9% accuracy with fewer 1.15 frames than other state-of-the-art methods on Kinetics 600.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116985538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiST-GAN: Distillation-based Semantic Transfer for Text-Guided Face Generation 基于提取的文本引导人脸生成语义转移
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00149
Guoxing Yang, Feifei Fu, Nanyi Fei, Hao Wu, Ruitao Ma, Zhiwu Lu
{"title":"DiST-GAN: Distillation-based Semantic Transfer for Text-Guided Face Generation","authors":"Guoxing Yang, Feifei Fu, Nanyi Fei, Hao Wu, Ruitao Ma, Zhiwu Lu","doi":"10.1109/ICME55011.2023.00149","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00149","url":null,"abstract":"Recently, large-scale pre-training has achieved great success in multi-modal tasks and shown powerful generalization ability due to superior semantic comprehension. In the field of text-to-image synthesis, recent works induce large-scale pre-training with VQ-VAE as a discrete visual tokenizer, which can synthesize realistic images from arbitrary text inputs. However, the quality of images generated by these methods is still inferior to that of images generated by GAN-based methods, especially in some specific domains. To leverage both the superior semantic comprehension of large-scale pre-training models and the powerful ability of GAN-based models in photorealistic image generation, we propose a novel knowledge distillation framework termed DiST-GAN to transfer the semantic knowledge of large-scale visual-language pre-training models (e.g., CLIP) to GAN-based generator for text-guided face image generation. Our DiST-GAN consists of two key components: (1) A new CLIP-based adaptive contrastive loss is devised to ensure the generated images are consistent with the input texts. (2) A language-to-vision (L2V) transformation module is learned to transform token embeddings of each text into an intermediate embedding that is aligned with the image embedding extracted by CLIP. With these two novel components, the semantic knowledge contained in CLIP can thus be transferred to GAN-based generator which preserves the superior ability of photorealistic image generation in the mean time. Extensive results on the Multi-Modal CelebA-HQ dataset show that our DiST-GAN achieves significant improvements over the state-of-the-arts.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117326015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Geometrical Characterization on Feature Density of Image Datasets 图像数据集特征密度的几何表征
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00313
Zhen Liang, Changyuan Zhao, Wanwei Liu, Bai Xue, Wenjing Yang
{"title":"A Geometrical Characterization on Feature Density of Image Datasets","authors":"Zhen Liang, Changyuan Zhao, Wanwei Liu, Bai Xue, Wenjing Yang","doi":"10.1109/ICME55011.2023.00313","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00313","url":null,"abstract":"Recently, the interpretability and verification of deep learning have attracted enormous attention from both academic and industrial communities, aiming to gain users’ trust and ease their concerns. To guide learning procedures or data operations carried out in a more interpretable way, in this paper, we put a similar perspective on image datasets, the inputs of deep learning. Based on manifold learning, we work out an interpretable geometrical characterization on the curvity of manifolds to depict the feature density of datasets, which is represented with the ratio of the Euclidean distance and the geodesic distance. It is a noteworthy characteristic of image datasets and we take the dataset compression and enhancement problems as application instances via sample credit assignment with the geometrical information. Experiments on typical image datasets have justified the effectiveness and enormous prospect of the presented geometrical characteristic.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129728510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Video Streaming for Sustainability and Quality: The Role of Preset Selection in Per-Title Encoding 优化视频流的可持续性和质量:预设选择在标题编码中的作用
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00289
Hadi Amirpour, V. V. Menon, Samira Afzal, R.-C. Prodan, C. Timmerer
{"title":"Optimizing Video Streaming for Sustainability and Quality: The Role of Preset Selection in Per-Title Encoding","authors":"Hadi Amirpour, V. V. Menon, Samira Afzal, R.-C. Prodan, C. Timmerer","doi":"10.1109/ICME55011.2023.00289","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00289","url":null,"abstract":"HTTP Adaptive Streaming (HAS) methods divide a video into smaller segments, encoded at multiple pre-defined bitrates to construct a bitrate ladder. Bitrate ladders are usually optimized per title over several dimensions, such as bitrate, resolution, and framerate. This paper adds a new dimension to the bitrate ladder by considering the energy consumption of the encoding process. Video encoders often have multiple pre-defined presets to balance the trade-off between encoding time, energy consumption, and compression efficiency. Faster presets disable certain coding tools defined by the codec to reduce the encoding time at the cost of reduced compression efficiency. Firstly, this paper evaluates the energy consumption and compression efficiency of different x265 presets for 500 video sequences. Secondly, optimized presets are selected for various representations in a bitrate ladder based on the results to guarantee a minimal drop in video quality while saving energy. Finally, a new per title model, which optimizes the trade-off between compression efficiency and energy consumption, is proposed. The experimental results show that decreasing the VMAF score by 0.15 and 0.39 while choosing an optimized preset results in encoding energy savings of 70% and 83%, respectively.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129329103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信