Proceedings of the 2nd ACM International Conference on Multimedia in Asia最新文献

筛选
英文 中文
Overlap classification mechanism for skeletal bone age assessment 骨骼骨龄评估的重叠分类机制
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446286
Pengyi Hao, Xuhang Xie, Tianxing Han, Cong Bai
{"title":"Overlap classification mechanism for skeletal bone age assessment","authors":"Pengyi Hao, Xuhang Xie, Tianxing Han, Cong Bai","doi":"10.1145/3444685.3446286","DOIUrl":"https://doi.org/10.1145/3444685.3446286","url":null,"abstract":"The bone development is a continuous process, however, discrete labels are usually used to represent bone ages. This inevitably causes a semantic gap between actual situation and label representation scope. In this paper, we present a novel method named as overlap classification network to narrow the semantic gap in bone age assessment. In the proposed network, discrete bone age labels (such as 0-228 month) are considered as a sequence that is used to generate a series of subsequences. Then the proposed network makes use of the overlapping information between adjacent subsequences and output several bone age ranges at the same time for one case. The overlapping part of these age ranges is considered as the final predicted bone age. The proposed method without any preprocessing can achieve a much smaller mean absolute error compared with state-of-the-art methods on a public dataset.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125905000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fixation guided network for salient object detection 用于显著目标检测的固定导向网络
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446288
Zhe Cui, Li Su, Weigang Zhang, Qingming Huang
{"title":"Fixation guided network for salient object detection","authors":"Zhe Cui, Li Su, Weigang Zhang, Qingming Huang","doi":"10.1145/3444685.3446288","DOIUrl":"https://doi.org/10.1145/3444685.3446288","url":null,"abstract":"Convolutional neural network (CNN) based salient object detection (SOD) has achieved great development in recent years. However, in some challenging cases, i.e. small-scale salient object, low contrast salient object and cluttered background, existing salient object detect methods are still not satisfying. In order to accurately detect salient objects, SOD networks need to fix the position of most salient part. Fixation prediction (FP) focuses on the most visual attractive regions, so we think it could assist in locating salient objects. As far as we know, there are few methods jointly consider SOD and FP tasks. In this paper, we propose a fixation guided salient object detection network (FGNet) to leverage the correlation between SOD and FP. FGNet consists of two branches to deal with fixation prediction and salient object detection respectively. Further, an effective feature cooperation module (FCM) is proposed to fuse complementary information between the two branches. Extensive experiments on four popular datasets and comparisons with twelve state-of-the-art methods show that the proposed FGNet well captures the main context of images and locates salient objects more accurately.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125992037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Destylization of text with decorative elements 用装饰元素将文本去风格化
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446324
Yuting Ma, Fan Tang, Weiming Dong, Changsheng Xu
{"title":"Destylization of text with decorative elements","authors":"Yuting Ma, Fan Tang, Weiming Dong, Changsheng Xu","doi":"10.1145/3444685.3446324","DOIUrl":"https://doi.org/10.1145/3444685.3446324","url":null,"abstract":"Style text with decorative elements has a strong visual sense, and enriches our daily work, study and life. However, it introduces new challenges to text detection and recognition. In this study, we propose a text destylized framework, that can transform the stylized texts with decorative elements into a type that is easily distinguishable by a detection or recognition model. We arranged and integrate an existing stylistic text data set to train the destylized network. The new destylized data set contains English letters and Chinese characters. The proposed approach enables a framework to handle both Chinese characters and English letters without the need for additional networks. Experiments show that the method is superior to the state-of-the-art style-related models.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134048240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph convolution network with node feature optimization using cross attention for few-shot learning 基于交叉关注的节点特征优化图卷积网络
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446278
Ying Liu, Yanbo Lei, Sheikh Faisal Rashid
{"title":"Graph convolution network with node feature optimization using cross attention for few-shot learning","authors":"Ying Liu, Yanbo Lei, Sheikh Faisal Rashid","doi":"10.1145/3444685.3446278","DOIUrl":"https://doi.org/10.1145/3444685.3446278","url":null,"abstract":"Graph convolution network (GCN) is an important method recently developed for few-shot learning. The adjacency matrix in GCN models is constructed based on graph node features to represent the graph node relationships, according to which, the graph network achieves message-passing inference. Therefore, the representation ability of graph node features is an important factor affecting the learning performance of GCN. This paper proposes an improved GCN model with node feature optimization using cross attention, named GCN-NFO. Leveraging on cross attention mechanism to associate the image features of support set and query set, the proposed model extracts more representative and discriminative salient region features as initialization features of graph nodes through information aggregation. Since graph network can represent the relationship between samples, the optimized graph node features transmit information through the graph network, thus implicitly enhances the similarity of intra-class samples and the dissimilarity of inter-class samples, thus enhancing the learning capability of GCN. Intensive experimental results on image classification task using different image datasets prove that GCN-NFO is an effective few-shot learning algorithm which significantly improves the classification accuracy, compared with other existing models.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131151647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Structure-preserving extremely low light image enhancement with fractional order differential mask guidance 分数阶微分掩模导引下保持结构的极弱光图像增强
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446319
Yijun Liu, Zhengning Wang, Ruixu Geng, Hao Zeng, Yi Zeng
{"title":"Structure-preserving extremely low light image enhancement with fractional order differential mask guidance","authors":"Yijun Liu, Zhengning Wang, Ruixu Geng, Hao Zeng, Yi Zeng","doi":"10.1145/3444685.3446319","DOIUrl":"https://doi.org/10.1145/3444685.3446319","url":null,"abstract":"Low visibility and high-level noise are two challenges for low-light image enhancement. In this paper, by introducing fractional order differential, we propose an end-to-end conditional generative adversarial network(GAN) to solve those two problems. For the problem of low visibility, we set up a global discriminator to improve the overall reconstruction quality and restore brightness information. For the high-level noise problem, we introduce fractional order differentiation into both the generator and the discriminator. Compared with conventional end-to-end methods, fractional order can better distinguish noise and high-frequency details, thereby achieving superior noise reduction effects while maintaining details. Finally, experimental results show that the proposed model obtains superior visual effects in low-light image enhancement. By introducing fractional order differential, we anticipate that our framework will enable high quality and detailed image recovery not only in the field of low-light enhancement but also in other fields that require details.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122486176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Adaptive feature aggregation network for nuclei segmentation 核分割的自适应特征聚合网络
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446271
Ruizhe Geng, Zhongyi Huang, Jie Chen
{"title":"Adaptive feature aggregation network for nuclei segmentation","authors":"Ruizhe Geng, Zhongyi Huang, Jie Chen","doi":"10.1145/3444685.3446271","DOIUrl":"https://doi.org/10.1145/3444685.3446271","url":null,"abstract":"Nuclei instance segmentation is essential for cell morphometrics and analysis, playing a crucial role in digital pathology. The problem of variability in nuclei characteristics among diverse cell types makes this task more challenging. Recently, proposal-based segmentation methods with feature pyramid network (FPN) has shown good performance because FPN integrates multi-scale features with strong semantics. However, FPN has information loss of the highest-level feature map and sub-optimal feature fusion strategies. This paper proposes a proposal-based adaptive feature aggregation methods (AANet) to make full use of multi-scale features. Specifically, AANet consists of two components: Context Augmentation Module (CAM) and Feature Adaptive Selection Module (ASM). In feature fusion, CAM focus on exploring extensive contextual information and capturing discriminative semantics to reduce the information loss of feature map at the highest pyramid level. The enhanced features are then sent to ASM to get a combined feature representation adaptively over all feature levels for each RoI. The experiments show our model's effectiveness on two publicly available datasets: the Kaggle 2018 Data Science Bowl dataset and the Multi-Organ nuclei segmentation dataset.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116565078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-constraint facial expression recognition 注意约束面部表情识别
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI: 10.1145/3444685.3446307
Qisheng Jiang
{"title":"Attention-constraint facial expression recognition","authors":"Qisheng Jiang","doi":"10.1145/3444685.3446307","DOIUrl":"https://doi.org/10.1145/3444685.3446307","url":null,"abstract":"To make full use of existing inherent correlation between facial regions and expression, we propose an attention-constraint facial expression recognition method, where the prior correlation between facial regions and expression is integrated into attention weights for extracting better representation. The proposed method mainly consists of four components: feature extractor, local self attention-constraint learner (LSACL), global and local attention-constraint learner (GLACL) and facial expression classifier. Specifically, feature extractor is mainly used to extract features from overall facial image and its corresponding cropped facial regions. Then, the extracted local features from facial regions are fed into local self attention-constraint learner, where some prior rank constraints summarized from facial domain knowledge are embedded into self attention weights. Similarly, the rank correlation constraints between respective facial region and a specified expression are further embedded into global-to-local attention weights when the global feature and local features from local self attention-constraint learner are fed into global and local attention-constraint learner. Finally, the feature from global and local attention-constraint learner and original global feature are fused and passed to facial expression classifier for conducting facial expression recognition. Experiments on two benchmark datasets validate the effectiveness of the proposed method.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132444820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards annotation-free evaluation of cross-lingual image captioning 跨语言图像字幕的无标注评价
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2020-12-09 DOI: 10.1145/3444685.3446322
Aozhu Chen, Xinyi Huang, Hailan Lin, Xirong Li
{"title":"Towards annotation-free evaluation of cross-lingual image captioning","authors":"Aozhu Chen, Xinyi Huang, Hailan Lin, Xirong Li","doi":"10.1145/3444685.3446322","DOIUrl":"https://doi.org/10.1145/3444685.3446322","url":null,"abstract":"Cross-lingual image captioning, with its ability to caption an unlabeled image in a target language other than English, is an emerging topic in the multimedia field. In order to save the precious human resource from re-writing reference sentences per target language, in this paper we make a brave attempt towards annotation-free evaluation of cross-lingual image captioning. Depending on whether we assume the availability of English references, two scenarios are investigated. For the first scenario with the references available, we propose two metrics, i.e., WMDRel and CLinRel. WMDRel measures the semantic relevance between a model-generated caption and machine translation of an English reference using their Word Mover's Distance. By projecting both captions into a deep visual feature space, CLinRel is a visual-oriented cross-lingual relevance measure. As for the second scenario, which has zero reference and is thus more challenging, we propose CMedRel to compute a cross-media relevance between the generated caption and the image content, in the same visual feature space as used by CLinRel. We have conducted a number of experiments to evaluate the effectiveness of the three proposed metrics. The combination of WMDRel, CLinRel and CMedRel has a Spearman's rank correlation of 0.952 with the sum of BLEU-4, METEOR, ROUGE-L and CIDEr, four standard metrics computed using references in the target language. CMedRel alone has a Spearman's rank correlation of 0.786 with the standard metrics. The promising results show high potential of the new metrics for evaluation with no need of references in the target language.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114484934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improving auto-encoder novelty detection using channel attention and entropy minimization 利用信道注意和熵最小化改进自编码器新颖性检测
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2020-07-03 DOI: 10.1145/3444685.3446311
Dongyan Guo, Miao Tian, Ying Cui, Xiang Pan, Shengyong Chen
{"title":"Improving auto-encoder novelty detection using channel attention and entropy minimization","authors":"Dongyan Guo, Miao Tian, Ying Cui, Xiang Pan, Shengyong Chen","doi":"10.1145/3444685.3446311","DOIUrl":"https://doi.org/10.1145/3444685.3446311","url":null,"abstract":"Novelty detection is a important research area which mainly solves the classification problem of inliers which usually consists of normal samples and outliers composed of abnormal samples. Auto-encoder is often used for novelty detection. However, the generalization ability of the auto-encoder may cause the undesirable reconstruction of abnormal elements and reduce the identification ability of the model. To solve the problem, we focus on the perspective of better reconstructing the normal samples as well as retaining the unique information of normal samples to improve the performance of auto-encoder for novelty detection. Firstly, we introduce attention mechanism into the task. Under the action of attention mechanism, auto-encoder can pay more attention to the representation of inlier samples through adversarial training. Secondly, we apply the information entropy into the latent layer to make it sparse and constrain the expression of diversity. Experimental results on three public datasets show that the proposed method achieves comparable performance compared with previous popular approaches.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115694903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
C3VQG: category consistent cyclic visual question generation C3VQG:类别一致循环可视化问题生成
Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2020-05-15 DOI: 10.1145/3444685.3446302
Shagun Uppal, Anish Madan, Sarthak Bhagat, Yi Yu, R. Shah
{"title":"C3VQG: category consistent cyclic visual question generation","authors":"Shagun Uppal, Anish Madan, Sarthak Bhagat, Yi Yu, R. Shah","doi":"10.1145/3444685.3446302","DOIUrl":"https://doi.org/10.1145/3444685.3446302","url":null,"abstract":"Visual Question Generation (VQG) is the task of generating natural questions based on an image. Popular methods in the past have explored image-to-sequence architectures trained with maximum likelihood which have demonstrated meaningful generated questions given an image and its associated ground-truth answer. VQG becomes more challenging if the image contains rich contextual information describing its different semantic categories. In this paper, we try to exploit the different visual cues and concepts in an image to generate questions using a variational autoencoder (VAE) without ground-truth answers. Our approach solves two major shortcomings of existing VQG systems: (i) minimize the level of supervision and (ii) replace generic questions with category relevant generations. Most importantly, by eliminating expensive answer annotations, the required supervision is weakened. Using different categories enables us to exploit different concepts as the inference requires only the image and the category. Mutual information is maximized between the image, question, and answer category in the latent space of our VAE. A novel category consistent cyclic loss is proposed to enable the model to generate consistent predictions with respect to the answer category, reducing redundancies and irregularities. Additionally, we also impose supplementary constraints on the latent space of our generative model to provide structure based on categories and enhance generalization by encapsulating decorrelated features within each dimension. Through extensive experiments, the proposed model, C3VQG outperforms state-of-the-art VQG methods with weak supervision.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129302118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信