Proceedings of the 2nd ACM International Conference on Multimedia in Asia最新文献

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446280

T. Haruyama, Sho Takahashi, Takahiro Ogawa, M. Haseyama

{"title":"Similar scene retrieval in soccer videos with weak annotations by multimodal use of bidirectional LSTM","authors":"T. Haruyama, Sho Takahashi, Takahiro Ogawa, M. Haseyama","doi":"10.1145/3444685.3446280","DOIUrl":"https://doi.org/10.1145/3444685.3446280","url":null,"abstract":"This paper presents a novel method to retrieve similar scenes in soccer videos with weak annotations via multimodal use of bidirectional long short-term memory (BiLSTM). The significant increase in the number of different types of soccer videos with the development of technology brings valid assets for effective coaching, but it also increases the work of players and training staff. We tackle this problem with a nontraditional combination of pre-trained models for feature extraction and BiLSTMs for feature transformation. By using the pre-trained models, no training data is required for feature extraction. Then effective feature transformation for similarity calculation is performed by applying BiLSTM trained with weak annotations. This transformation allows for highly accurate capture of soccer video context from less annotation work. In this paper, we achieve an accurate retrieval of similar scenes by multimodal use of this BiLSTM-based transformer trainable with less human effort. The effectiveness of our method was verified by comparative experiments with state-of-the-art using actual soccer video dataset.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124449118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Low-quality watermarked face inpainting with discriminative residual learning 基于判别残差学习的低质量水印人脸修复

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446261

Zheng He, Xueli Wei, Kangli Zeng, Zhen Han, Qin Zou, Zhongyuan Wang

引用次数: 1

Two-stage structure aware image inpainting based on generative adversarial networks 基于生成对抗网络的两阶段结构感知图像绘制

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446260

Jin Wang, Xi Zhang, Chen Wang, Qing Zhu, Baocai Yin

{"title":"Two-stage structure aware image inpainting based on generative adversarial networks","authors":"Jin Wang, Xi Zhang, Chen Wang, Qing Zhu, Baocai Yin","doi":"10.1145/3444685.3446260","DOIUrl":"https://doi.org/10.1145/3444685.3446260","url":null,"abstract":"In recent years, the image inpainting technology based on deep learning has made remarkable progress, which can better complete the complex image inpainting task compared with traditional methods. However, most of the existing methods can not generate reasonable structure and fine texture details at the same time. To solve this problem, in this paper we propose a two-stage image inpainting method with structure awareness based on Generative Adversarial Networks, which divides the inpainting process into two sub tasks, namely, image structure generation and image content generation. In the former stage, the network generates the structural information of the missing area; while in the latter stage, the network uses this structural information as a prior, and combines the existing texture and color information to complete the image. Extensive experiments are conducted to evaluate the performance of our proposed method on Places2, CelebA and Paris Streetview datasets. The experimental results show the superior performance of the proposed method compared with other state-of-the-art methods qualitatively and quantitatively.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134539340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attention feature matching for weakly-supervised video relocalization 弱监督视频重定位的注意特征匹配

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446317

Haoyu Tang, Jihua Zhu, Zan Gao, Tao Zhuo, Zhiyong Cheng

{"title":"Attention feature matching for weakly-supervised video relocalization","authors":"Haoyu Tang, Jihua Zhu, Zan Gao, Tao Zhuo, Zhiyong Cheng","doi":"10.1145/3444685.3446317","DOIUrl":"https://doi.org/10.1145/3444685.3446317","url":null,"abstract":"Localizing the desired video clip for a given query in an untrimmed video has been a hot research topic for multimedia understanding. Recently, a new task named video relocalization, in which the query is a video clip, has been raised. Some methods have been developed for this task, however, these methods often require dense annotations of the temporal boundaries inside long videos for training. A more practical solution is the weakly-supervised approach, which only needs the matching information between the query and video. Motivated by that, we propose a weakly-supervised video relocalization approach based on an attention-based feature matching method. Specifically, it recognizes the video clip by finding the clip whose frames are the most relevant to the query clip frames based on the matching results of the frame embeddings. In addition, an attention module is introduced to identify the frames containing rich semantic correlations in the query video. Extensive experiments on the ActivityNet dataset demonstrate that our method can outperform several weakly-supervised methods consistently and even achieve competing performance to supervised baselines.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114721430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Graph-based motion prediction for abnormal action detection 基于图的异常动作检测运动预测

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446316

Yao Tang, Lin Zhao, Zhaoliang Yao, Chen Gong, Jian Yang

{"title":"Graph-based motion prediction for abnormal action detection","authors":"Yao Tang, Lin Zhao, Zhaoliang Yao, Chen Gong, Jian Yang","doi":"10.1145/3444685.3446316","DOIUrl":"https://doi.org/10.1145/3444685.3446316","url":null,"abstract":"Abnormal action detection is the most noteworthy part of anomaly detection, which tries to identify unusual human behaviors in videos. Previous methods typically utilize future frame prediction to detect frames deviating from the normal scenario. While this strategy enjoys success in the accuracy of anomaly detection, critical information such as the cause and location of the abnormality is unable to be acquired. This paper proposes human motion prediction for abnormal action detection. We employ sequence of human poses to represent human motion, and detect irregular behavior by comparing the predicted pose with the actual pose detected in the frame. Hence the proposed method is able to explain why the action is regarded as irregularity and locate where the anomaly happens. Moreover, pose sequence is robust to noise, complex background and small targets in videos. Since posture information is non-Euclidean data, graph convolutional network is adopted for future pose prediction, which not only leads to greater expressive power but also stronger generalization capability. Experiments are conducted both on the widely used anomaly detection dataset ShanghaiTech and our newly proposed dataset NJUST-Anomaly, which mainly contains irregular behaviors happened in the campus. Our dataset expands the existing datasets by giving more abnormal actions attracting public attention in social security, which happen in more complex scenes and dynamic backgrounds. Experimental results on both datasets demonstrate the superiority of our method over the-state-of-the-art methods. The source code and NJUST-Anomaly dataset will be made public at https://github.com/datangzhengqing/MP-GCN.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"375 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126719536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Multiplicative angular margin loss for text-based person search 基于文本的人员搜索的乘法角边损失

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446314

Peng Zhang, Deqiang Ouyang, Feiyu Chen, Jie Shao

{"title":"Multiplicative angular margin loss for text-based person search","authors":"Peng Zhang, Deqiang Ouyang, Feiyu Chen, Jie Shao","doi":"10.1145/3444685.3446314","DOIUrl":"https://doi.org/10.1145/3444685.3446314","url":null,"abstract":"Text-based person search aims at retrieving the most relevant pedestrian images from database in response to a query in form of natural language description. Existing algorithms mainly focus on embedding textual and visual features into a common semantic space so that the similarity score of features from different modalities can be computed directly. Softmax loss is widely adopted to classify textual and visual features into a correct category in the joint embedding space. However, softmax loss can only help classify features but not increase the intra-class compactness and inter-class discrepancy. To this end, we propose multiplicative angular margin (MAM) loss to learn angularly discriminative features for each identity. The multiplicative angular margin loss penalizes the angle between feature vector and its corresponding classifier vector to learn more discriminative feature. Moreover, to focus more on informative image-text pair, we propose pairwise similarity weighting (PSW) loss to assign higher weight to informative pairs. Extensive experimental evaluations have been conducted on the CUHK-PEDES dataset over our proposed losses. The results show the superiority of our proposed method. Code is available at https://github.com/pengzhanguestc/MAM_loss.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114232488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Defense for adversarial videos by self-adaptive JPEG compression and optical texture 通过自适应JPEG压缩和光学纹理防御对抗性视频

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446308

Yupeng Cheng, Xingxing Wei, H. Fu, Shang-Wei Lin, Weisi Lin

{"title":"Defense for adversarial videos by self-adaptive JPEG compression and optical texture","authors":"Yupeng Cheng, Xingxing Wei, H. Fu, Shang-Wei Lin, Weisi Lin","doi":"10.1145/3444685.3446308","DOIUrl":"https://doi.org/10.1145/3444685.3446308","url":null,"abstract":"Despite demonstrated outstanding effectiveness in various computer vision tasks, Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Nowadays, adversarial attacks as well as their defenses w.r.t. DNNs in image domain have been intensively studied, and there are some recent works starting to explore adversarial attacks w.r.t. DNNs in video domain. However, the corresponding defense is rarely studied. In this paper, we propose a new two-stage framework for defending video adversarial attack. It contains two main components, namely self-adaptive Joint Photographic Experts Group (JPEG) compression defense and optical texture based defense (OTD). In self-adaptive JPEG compression defense, we propose to adaptively choose an appropriate JPEG quality based on an estimation of moving foreground object, such that the JPEG compression could depress most impact of adversarial noise without losing too much video quality. In OTD, we generate \"optical texture\" containing high-frequency information based on the optical flow map, and use it to edit Y channel (in YCrCb color space) of input frames, thus further reducing the influence of adversarial perturbation. Experimental results on a benchmark dataset demonstrate the effectiveness of our framework in recovering the classification performance on perturbed videos.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"35 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116616076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Determining image age with rank-consistent ordinal classification and object-centered ensemble 用秩一致有序分类和目标中心集成确定图像年龄

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446326

Shota Ashida, A. Jatowt, A. Doucet, Masatoshi Yoshikawa

引用次数: 0

Relationship graph learning network for visual relationship detection 用于视觉关系检测的关系图学习网络

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446312

Yanan Li, Jun Yu, Yibing Zhan, Zhi Chen

{"title":"Relationship graph learning network for visual relationship detection","authors":"Yanan Li, Jun Yu, Yibing Zhan, Zhi Chen","doi":"10.1145/3444685.3446312","DOIUrl":"https://doi.org/10.1145/3444685.3446312","url":null,"abstract":"Visual relationship detection aims to predict the relationships between detected object pairs. It is well believed that the correlations between image components (i.e., objects and relationships between objects) are significant considerations when predicting objects' relationships. However, most current visual relationship detection methods only exploited the correlations among objects, and the correlations among objects' relationships remained underexplored. This paper proposes a relationship graph learning network (RGLN) to explore the correlations among objects' relationships for visual relationship detection. Specifically, RGLN obtains image objects using an object detector, and then, every pair of objects constitutes a relationship proposal. All relationship proposals construct a relationship graph, in which the proposals are treated as nodes. Accordingly, RGLN designs bi-stream graph attention subnetworks to detect relationship proposals, in which one graph attention subnetwork analyzes correlations among relationships based on visual and spatial information, and the other analyzes correlations based on semantic and spatial information. Besides, RGLN exploits a relationship selection subnetwork to ignore redundant information of object pairs with no relationships. We conduct extensive experiments on two public datasets: the VRD and the VG datasets. The experimental results compared with the state-of-the-art demonstrate the competitiveness of RGLN.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133276438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Table detection and cell segmentation in online handwritten documents with graph attention networks 基于图关注网络的在线手写文档表检测与单元分割

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446295

Ying-Jian Liu, Heng Zhang, Xiao-Long Yun, Jun-Yu Ye, Cheng-Lin Liu

引用次数: 0