ACM Multimedia Asia最新文献

筛选
英文 中文
Visual Storytelling with Hierarchical BERT Semantic Guidance 基于层次BERT语义引导的视觉叙事
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490604
Ruichao Fan, Hanli Wang, Jinjing Gu, Xianhui Liu
{"title":"Visual Storytelling with Hierarchical BERT Semantic Guidance","authors":"Ruichao Fan, Hanli Wang, Jinjing Gu, Xianhui Liu","doi":"10.1145/3469877.3490604","DOIUrl":"https://doi.org/10.1145/3469877.3490604","url":null,"abstract":"Visual storytelling, which aims at automatically producing a narrative paragraph for photo album, remains quite challenging due to the complexity and diversity of photo album content. In addition, open-domain photo albums cover a broad range of topics and this results in highly variable vocabularies and expression styles to describe photo albums. In this work, a novel teacher-student visual storytelling framework with hierarchical BERT semantic guidance (HBSG) is proposed to address the above-mentioned challenges. The proposed teacher module consists of two joint tasks, namely, word-level latent topic generation and semantic-guided sentence generation. The first task aims to predict the latent topic of the story. As there is no ground-truth topic information, a pre-trained BERT model based on visual contents and annotated stories is utilized to mine topics. Then the topic vector is distilled to a designed image-topic prediction model. In the semantic-guided sentence generation task, HBSG is introduced for two purposes. The first is to narrow down the language complexity across topics, where the co-attention decoder with vision and semantic is designed to leverage the latent topics to induce topic-related language models. The second is to employ sentence semantic as an online external linguistic knowledge teacher module. Finally, an auxiliary loss is devised to transform linguistic knowledge into the language generation model. Extensive experiments are performed to demonstrate the effectiveness of HBSG framework, which surpasses the state-of-the-art approaches evaluated on the VIST test set.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129111001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Local Self-Attention on Fine-grained Cross-media Retrieval 细粒度跨媒体检索中的局部自关注
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490590
Chen Wang, Yazhou Yao, Qiong Wang, Zhenmin Tang
{"title":"Local Self-Attention on Fine-grained Cross-media Retrieval","authors":"Chen Wang, Yazhou Yao, Qiong Wang, Zhenmin Tang","doi":"10.1145/3469877.3490590","DOIUrl":"https://doi.org/10.1145/3469877.3490590","url":null,"abstract":"Due to the heterogeneity gap, the data representation of different media is inconsistent and belongs to different feature spaces. Therefore, it is challenging to measure the fine-grained gap between them. To this end, we propose an attention space training method to learn common representations of different media data. Specifically, we utilize local self-attention layers to learn the common attention space between different media data. We propose a similarity concatenation method to understand the content relationship between features. To further improve the robustness of the model, we also train a local position encoding to capture the spatial relationships between features. In this way, our proposed method can effectively reduce the gap between different feature distributions on cross-media retrieval tasks. It also improves the fine-grained recognition performance by attaching attention to high-level semantic information. Extensive experiments and ablation studies demonstrate that our proposed method achieves state-of-the-art performance. At the same time, our approach provides a new pipeline for fine-grained cross-media retrieval. The source code and models are publicly available at: https://github.com/NUST-Machine-Intelligence-Laboratory/SAFGCMHN.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123356357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Private-Share: A Secure and Privacy-Preserving De-Centralized Framework for Large Scale Data Sharing 私有共享:用于大规模数据共享的安全且保护隐私的去中心化框架
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493588
Arun Zachariah, Maha M AlRasheed
{"title":"Private-Share: A Secure and Privacy-Preserving De-Centralized Framework for Large Scale Data Sharing","authors":"Arun Zachariah, Maha M AlRasheed","doi":"10.1145/3469877.3493588","DOIUrl":"https://doi.org/10.1145/3469877.3493588","url":null,"abstract":"The various data and privacy regulations introduced around the globe, require data to be stored in a secure and privacy-preserving fashion. Non-compliance with these regulations come with major consequences. This has led to the formation of huge data silos within organizations leading to difficult data analysis along with an increased risk of a data breach. Isolating data also prevents collaborative research. To address this, we present Private-Share, a framework that would enable secure sharing of large scale data. In order to achieve this goal, Private-Share leverages the recent advances in blockchain technology specifically the InterPlanetary File System and Ethereum.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pose-aware Outfit Transfer between Unpaired in-the-wild Fashion Images 姿势意识的服装转换之间的不成对的野生时尚图像
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490569
Donnaphat Trakulwaranont, Marc A. Kastner, S. Satoh
{"title":"Pose-aware Outfit Transfer between Unpaired in-the-wild Fashion Images","authors":"Donnaphat Trakulwaranont, Marc A. Kastner, S. Satoh","doi":"10.1145/3469877.3490569","DOIUrl":"https://doi.org/10.1145/3469877.3490569","url":null,"abstract":"Virtual try-on systems became popular for visualizing outfits, due to the importance of individual fashion in many communities. The objective of such a system is to transfer a piece of clothing to another person while preserving its detail and characteristics. To generate a realistic in-the-wild image, it needs visual optimization of the clothing, background and target person, making this task still very challenging. In this paper, we develop a method that generates realistic try-on images with unpaired images from in-the-wild datasets. Our proposed method starts with generating a mock-up paired image using geometric transfer. Then, the target’s pose information is adjusted using a modified pose-attention module. We combine a reconstruction and a content loss to preserve the detail and style of the transferred clothing, background and the target person. We evaluate the approach on the Fashionpedia dataset and can show a promising performance over a baseline approach.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131313298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages 利用资源丰富的语言数据集进行资源贫乏语言的端到端场景文本识别
ACM Multimedia Asia Pub Date : 2021-11-24 DOI: 10.1145/3469877.3490571
Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura
{"title":"Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages","authors":"Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura","doi":"10.1145/3469877.3490571","DOIUrl":"https://doi.org/10.1145/3469877.3490571","url":null,"abstract":"This paper presents a novel training method for end-to-end scene text recognition. End-to-end scene text recognition offers high recognition accuracy, especially when using the encoder-decoder model based on Transformer. To train a highly accurate end-to-end model, we need to prepare a large image-to-text paired dataset for the target language. However, it is difficult to collect this data, especially for resource-poor languages. To overcome this difficulty, our proposed method utilizes well-prepared large datasets in resource-rich languages such as English, to train the resource-poor encoder-decoder model. Our key idea is to build a model in which the encoder reflects knowledge of multiple languages while the decoder specializes in knowledge of just the resource-poor language. To this end, the proposed method pre-trains the encoder by using a multilingual dataset that combines the resource-poor language’s dataset and the resource-rich language’s dataset to learn language-invariant knowledge for scene text recognition. The proposed method also pre-trains the decoder by using the resource-poor language’s dataset to make the decoder better suited to the resource-poor language. Experiments on Japanese scene text recognition using a small, publicly available dataset demonstrate the effectiveness of the proposed method.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121992174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Holodeck: Immersive 3D Displays Using Swarms of Flying Light Specks [Extended Abstract] 全息甲板:使用飞行光斑群的沉浸式3D显示[扩展摘要]
ACM Multimedia Asia Pub Date : 2021-11-02 DOI: 10.1145/3469877.3493698
Shahram Ghandeharizadeh
{"title":"Holodeck: Immersive 3D Displays Using Swarms of Flying Light Specks [Extended Abstract]","authors":"Shahram Ghandeharizadeh","doi":"10.1145/3469877.3493698","DOIUrl":"https://doi.org/10.1145/3469877.3493698","url":null,"abstract":"Unmanned Aerial Vehicles (UAVs) have moved beyond a platform for hobbyists to enable environmental monitoring, journalism, film industry, search and rescue, package delivery, and entertainment. This paper describes 3D displays using swarms of flying light specks, FLSs. An FLS is a small (hundreds of micrometers in size) UAV with one or more light sources to generate different colors and textures with adjustable brightness. A synchronized swarm of FLSs renders an illumination in a pre-specified 3D volume, an FLS display. An FLS display provides true depth, enabling a user to perceive a scene more completely by analyzing its illumination from different angles. An FLS display may either be non-immersive or immersive. Both will support 3D acoustics. Non-immersive FLS displays may be the size of a 1980’s computer monitor, enabling a surgical team to observe and control micro robots performing heart surgery inside a patient’s body. Immersive FLS displays may be the size of a room, enabling users to interact with objects, e.g., a rock, a teapot. An object with behavior will be constructed using FLS-matters. FLS-matter will enable a user to touch and manipulate an object, e.g., a user may pick up a teapot or throw a rock. An immersive and interactive FLS display will approximate Star Trek’s holodeck. A successful realization of the research ideas presented in this paper will provide fundamental insights into implementing a holodeck using swarms of FLSs. A holodeck will transform the future of human communication and perception, and how we interact with information and data. It will revolutionize the future of how we work, learn, play and entertain, receive medical care, and socialize.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124401738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Hierarchical Deep Residual Reasoning for Temporal Moment Localization 时间矩定位的层次深度残差推理
ACM Multimedia Asia Pub Date : 2021-10-31 DOI: 10.1145/3469877.3490595
Ziyang Ma, Xianjing Han, Xuemeng Song, Yiran Cui, Liqiang Nie
{"title":"Hierarchical Deep Residual Reasoning for Temporal Moment Localization","authors":"Ziyang Ma, Xianjing Han, Xuemeng Song, Yiran Cui, Liqiang Nie","doi":"10.1145/3469877.3490595","DOIUrl":"https://doi.org/10.1145/3469877.3490595","url":null,"abstract":"Temporal Moment Localization (TML) in untrimmed videos is a challenging task in the field of multimedia, which aims at localizing the start and end points of the activity in the video, described by a sentence query. Existing methods mainly focus on mining the correlation between video and sentence representations or investigating the fusion manner of the two modalities. These works mainly understand the video and sentence coarsely, ignoring the fact that a sentence can be understood from various semantics, and the dominant words affecting the moment localization in the semantics are the action and object reference. Toward this end, we propose a Hierarchical Deep Residual Reasoning (HDRR) model, which decomposes the video and sentence into multi-level representations with different semantics to achieve a finer-grained localization. Furthermore, considering that videos with different resolution and sentences with different length have different difficulty in understanding, we design the simple yet effective Res-BiGRUs for feature fusion, which is able to grasp the useful information in a self-adapting manner. Extensive experiments conducted on Charades-STA and ActivityNet-Captions datasets demonstrate the superiority of our HDRR model compared with other state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123846180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving Camouflaged Object Detection with the Uncertainty of Pseudo-edge Labels 利用伪边缘标签的不确定性改进伪装目标检测
ACM Multimedia Asia Pub Date : 2021-10-29 DOI: 10.1145/3469877.3490587
Nobukatsu Kajiura, Hong Liu, S. Satoh
{"title":"Improving Camouflaged Object Detection with the Uncertainty of Pseudo-edge Labels","authors":"Nobukatsu Kajiura, Hong Liu, S. Satoh","doi":"10.1145/3469877.3490587","DOIUrl":"https://doi.org/10.1145/3469877.3490587","url":null,"abstract":"This paper focuses on camouflaged object detection (COD), which is a task to detect objects hidden in the background. Most of the current COD models aim to highlight the target object directly while outputting ambiguous camouflaged boundaries. On the other hand, the performance of the models considering edge information is not yet satisfactory. To this end, we propose a new framework that makes full use of multiple visual cues, i.e., saliency as well as edges, to refine the predicted camouflaged map. This framework consists of three key components, i.e., a pseudo-edge generator, a pseudo-map generator, and an uncertainty-aware refinement module. In particular, the pseudo-edge generator estimates the boundary that outputs the pseudo-edge label, and the conventional COD method serves as the pseudo-map generator that outputs the pseudo-map label. Then, we propose an uncertainty-based module to reduce the uncertainty and noise of such two pseudo labels, which takes both pseudo labels as input and outputs an edge-accurate camouflaged map. Experiments on various COD datasets demonstrate the effectiveness of our method with superior performance to the existing state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125957253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Patch-Based Deep Autoencoder for Point Cloud Geometry Compression 基于补丁的深度自编码器点云几何压缩
ACM Multimedia Asia Pub Date : 2021-10-18 DOI: 10.1145/3469877.3490611
Kang-Soo You, Pan Gao
{"title":"Patch-Based Deep Autoencoder for Point Cloud Geometry Compression","authors":"Kang-Soo You, Pan Gao","doi":"10.1145/3469877.3490611","DOIUrl":"https://doi.org/10.1145/3469877.3490611","url":null,"abstract":"The ever-increasing 3D application makes the point cloud compression unprecedentedly important and needed. In this paper, we propose a patch-based compression process using deep learning, focusing on the lossy point cloud geometry compression. Unlike existing point cloud compression networks, which apply feature extraction and reconstruction on the entire point cloud, we divide the point cloud into patches and compress each patch independently. In the decoding process, we finally assemble the decompressed patches into a complete point cloud. In addition, we train our network by a patch-to-patch criterion, i.e., use the local reconstruction loss for optimization, to approximate the global reconstruction optimality. Our method outperforms the state-of-the-art in terms of rate-distortion performance, especially at low bitrates. Moreover, the compression process we proposed can guarantee to generate the same number of points as the input. The network model of this method can be easily applied to other point cloud reconstruction problems, such as upsampling.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123984882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation 移动前探索:具身导航的可行路径估计与记忆回忆框架
ACM Multimedia Asia Pub Date : 2021-10-16 DOI: 10.1145/3469877.3490570
Yang Wu, Shirui Feng, Guanbin Li, Liang Lin
{"title":"Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation","authors":"Yang Wu, Shirui Feng, Guanbin Li, Liang Lin","doi":"10.1145/3469877.3490570","DOIUrl":"https://doi.org/10.1145/3469877.3490570","url":null,"abstract":"In this paper, we focus on solving the navigation problem of embodied question answering (EmbodiedQA), where the lack of experience and common sense information essentially result in a failure finding target when the robot is spawn in unknown environments. We present a route planning method named Path Estimation and Memory Recalling (PEMR) framework. PEMR includes a “looking ahead” process, i.e. a visual feature extractor module that estimates feasible paths for gathering 3D navigational information; another process “looking behind” process that is a memory recalling mechanism aims at fully leveraging past experience collected by the feature extractor. To encourage the navigator to learn more accurate prior expert experience, we improve the original benchmark dataset and provide a family of evaluation metrics for diagnosing both navigation and question answering modules. We show strong experimental results of PEMR on the EmbodiedQA navigation task.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124663660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信