Proceedings of the 2nd ACM International Conference on Multimedia in Asia最新文献_第3页

Improving face recognition in surveillance video with judicious selection and fusion of representative frames 通过代表性帧的选择和融合，提高监控视频中的人脸识别能力

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446259

Zhaozhen Ding, Qingfang Zheng, Chunhua Hou, Guang Shen

引用次数: 0

Local structure alignment guided domain adaptation with few source samples 局部结构对齐引导域自适应少源样本

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446327

Yuying Cai, Jinfeng Li, Baodi Liu, Weifeng Liu, Kai Zhang, Changsheng Xu

{"title":"Local structure alignment guided domain adaptation with few source samples","authors":"Yuying Cai, Jinfeng Li, Baodi Liu, Weifeng Liu, Kai Zhang, Changsheng Xu","doi":"10.1145/3444685.3446327","DOIUrl":"https://doi.org/10.1145/3444685.3446327","url":null,"abstract":"Domain adaptation has received lots of attention for its high efficiency in dealing with cross-domain learning tasks. Most existing domain adaptation methods adopt the strategies relying on large amounts of source label information, which limits their applications in the real world where only a few label samples are available. We exploit the local geometric connections to tackle this problem and propose a Local Structure Alignment (LSA) guided domain adaptation method in this paper. LSA leverages the Nyström method to describe the distribution difference from the geometric perspective and then perform the distribution alignment between domains. Specifically, LSA constructs a domain-invariant Hessian matrix to locally connect the data of the two domains through minimizing the Nyström approximation error. And then it integrates the domain-invariant Hessian matrix with the semi-supervised learning and finally builds an adaptive semi-supervised model. Extensive experimental results validate that the proposed LSA outperforms the traditional domain adaptation methods especially when only sparse source label information is available.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116463023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intermediate coordinate based pose non-perspective estimation from line correspondences 基于中间坐标的线对应位姿非透视估计

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446299

Yujia Cao, Zhichao Cui, Yuehu Liu, Xiaojun Lv, K.C.C. Peng

引用次数: 0

Robust visual tracking via scale-aware localization and peak response strength 鲁棒视觉跟踪通过规模感知定位和峰值响应强度

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446274

Ying Wang, Luo Xiong, Kaiwen Du, Yan Yan, Hanzi Wang

{"title":"Robust visual tracking via scale-aware localization and peak response strength","authors":"Ying Wang, Luo Xiong, Kaiwen Du, Yan Yan, Hanzi Wang","doi":"10.1145/3444685.3446274","DOIUrl":"https://doi.org/10.1145/3444685.3446274","url":null,"abstract":"Existing regression-based deep trackers usually localize a target based on a response map, where the highest peak response corresponds to the predicted target location. Nevertheless, when the background distractors appear or the target scale changes frequently, the response map is prone to produce multiple sub-peak responses to interfere with model prediction. In this paper, we propose a robust online tracking method via Scale-Aware localization and Peak Response strength (SAPR), which can learn a discriminative model predictor to estimate a target state accurately. Specifically, to cope with large scale variations, we propose a Scale-Aware Localization (SAL) module to provide multi-scale response maps based on the scale pyramid scheme. Furthermore, to focus on the target response, we propose a simple yet effective Peak Response Strength (PRS) module to fuse the multi-scale response maps and the response maps generated by a correlation filter. According to the response map with the maximum classification score, the model predictor iteratively updates its filter weights for accurate target state estimation. Experimental results on three benchmark datasets, including OTB100, VOT2018 and LaSOT, demonstrate that the proposed SAPR accurately estimates the target state, achieving the favorable performance against several state-of-the-art trackers.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130979850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A background-induced generative network with multi-level discriminator for text-to-image generation 基于背景诱导的多层次鉴别器的文本到图像生成网络

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446291

Ping Wang, Li Liu, Huaxiang Zhang, Tianshi Wang

{"title":"A background-induced generative network with multi-level discriminator for text-to-image generation","authors":"Ping Wang, Li Liu, Huaxiang Zhang, Tianshi Wang","doi":"10.1145/3444685.3446291","DOIUrl":"https://doi.org/10.1145/3444685.3446291","url":null,"abstract":"Most existing text-to-image generation methods focus on synthesizing images using only text descriptions, but this cannot meet the requirement of generating desired objects with given backgrounds. In this paper, we propose a Background-induced Generative Network (BGNet) that combines attention mechanisms, background synthesis, and multi-level discriminator to generate realistic images with given backgrounds according to text descriptions. BGNet takes a multi-stage generation as the basic framework to generate fine-grained images and introduces a hybrid attention mechanism to capture the local semantic correlation between texts and images. To adjust the impact of the given backgrounds on the synthesized images, synthesis blocks are added at each stage of image generation, which appropriately combines the foreground objects generated by the text descriptions with the given background images. Besides, a multi-level discriminator and its corresponding loss function are proposed to optimize the synthesized images. The experimental results on the CUB bird dataset demonstrate the superiority of our method and its ability to generate realistic images with given backgrounds.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133740910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Transfer non-stationary texture with complex appearance 转移具有复杂外观的非静止纹理

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446297

Cheng Peng, Na Qi, Qing Zhu

{"title":"Transfer non-stationary texture with complex appearance","authors":"Cheng Peng, Na Qi, Qing Zhu","doi":"10.1145/3444685.3446297","DOIUrl":"https://doi.org/10.1145/3444685.3446297","url":null,"abstract":"Texture transfer has been successfully applied in computer vision and computer graphics. Since non-stationary textures are usually complex and anisotropic, it is challenging to transfer these textures by simple supervised method. In this paper, we propose a general solution for non-stationary texture transfer, which can preserve the local structure and visual richness of textures. The inputs of our framework are source texture and semantic annotation pair. We record different semantics as different regions and obtain the color and distribution information from different regions, which is used to guide the the low-level texture transfer algorithm. Specifically, we exploit these local distributions to regularize the texture transfer objective function, which is minimized by iterative search and voting steps. In the search step, we search the nearest neighbor fields of source image to target image through Generalized PatchMatch (GPM) algorithm. In the voting step, we calculate histogram weights and coherence weights for different semantic regions to ensure color accuracy and texture continuity, and to further transfer the textures from the source to the target. By comparing with state-of-the-art algorithms, we demonstrate the effectiveness and superiority of our technique in various non-stationary textures.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125112176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel system architecture and an automatic monitoring method for remote production 一种新颖的系统架构和远程生产的自动监控方法

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446277

Yasuhiro Mochida, D. Shirai, Takahiro Yamaguchi, S. Kuwabara, H. Nishizawa

引用次数: 1

Multi-focus noisy image fusion based on gradient regularized convolutional sparse representatione 基于梯度正则化卷积稀疏表示的多聚焦噪声图像融合

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446325

Xuanjing Shen, Yunqi Zhang, Haipeng Chen, Di Gai

引用次数: 0

10 years of video browser showdown 10年的视频浏览器对决

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3450215

K. Schoeffmann, Jakub Lokoč, W. Bailer

引用次数: 0

Story segmentation for news broadcast based on primary caption 基于主标题的新闻广播故事分割

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446298

Heling Chen, Zhongyuan Wang, Yingjiao Pei, Baojin Huang, Weiping Tu

{"title":"Story segmentation for news broadcast based on primary caption","authors":"Heling Chen, Zhongyuan Wang, Yingjiao Pei, Baojin Huang, Weiping Tu","doi":"10.1145/3444685.3446298","DOIUrl":"https://doi.org/10.1145/3444685.3446298","url":null,"abstract":"In the information explosion era, people only want to access the news information that they are interested in. News broadcast story segmentation is strongly needed, which is an essential basis for personalized delivery and short video. The existing advanced story boundary segmentation methods utilize semantic similarity of subtitles, thus entailing complex semantic computation. The title texts of news broadcast programs include headline (or primary) captions, dialogue captions and the channel logo, while the same story clips only render one primary caption in most news broadcast. Inspired by this fact, we propose a simple method for story segmentation based on the primary caption, which combines YOLOv3 based primary caption extraction and preliminary location of boundaries. In particular, we introduce mean hash to achieve the fast and reliable comparison for detected small-size primary caption blocks. We further incorporate scene recognition to exact the preliminary boundaries, because the primary captions always appear later than the story boundary. Experimental results on two Chinese news broadcast datasets show that our method enjoys high accuracy in terms of R, P and F1-measures.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131264427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1