Comput. Vis. Image Underst.最新文献

筛选
英文 中文
Progressive Scene Text Erasing with Self-Supervision 具有自我监督的渐进场景文本擦除
Comput. Vis. Image Underst. Pub Date : 2022-07-23 DOI: 10.48550/arXiv.2207.11469
Xiangcheng Du, Zhao Zhou, Yingbin Zheng, Xingjiao Wu, Tianlong Ma, Cheng Jin
{"title":"Progressive Scene Text Erasing with Self-Supervision","authors":"Xiangcheng Du, Zhao Zhou, Yingbin Zheng, Xingjiao Wu, Tianlong Ma, Cheng Jin","doi":"10.48550/arXiv.2207.11469","DOIUrl":"https://doi.org/10.48550/arXiv.2207.11469","url":null,"abstract":"Scene text erasing seeks to erase text contents from scene images and current state-of-the-art text erasing models are trained on large-scale synthetic data. Although data synthetic engines can provide vast amounts of annotated training samples, there are differences between synthetic and real-world data. In this paper, we employ self-supervision for feature representation on unlabeled real-world scene text images. A novel pretext task is designed to keep consistent among text stroke masks of image variants. We design the Progressive Erasing Network in order to remove residual texts. The scene text is erased progressively by leveraging the intermediate generated results which provide the foundation for subsequent higher quality results. Experiments show that our method significantly improves the generalization of the text erasing task and achieves state-of-the-art performance on public benchmarks.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74485378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Efficient Framework for Few-shot Skeleton-based Temporal Action Segmentation 基于少镜头骨架的时间动作分割的有效框架
Comput. Vis. Image Underst. Pub Date : 2022-07-20 DOI: 10.48550/arXiv.2207.09925
Leiyang Xu, Qianqian Wang, Xiaotian Lin, Lin Yuan
{"title":"An Efficient Framework for Few-shot Skeleton-based Temporal Action Segmentation","authors":"Leiyang Xu, Qianqian Wang, Xiaotian Lin, Lin Yuan","doi":"10.48550/arXiv.2207.09925","DOIUrl":"https://doi.org/10.48550/arXiv.2207.09925","url":null,"abstract":"Temporal action segmentation (TAS) aims to classify and locate actions in the long untrimmed action sequence. With the success of deep learning, many deep models for action segmentation have emerged. However, few-shot TAS is still a challenging problem. This study proposes an efficient framework for the few-shot skeleton-based TAS, including a data augmentation method and an improved model. The data augmentation approach based on motion interpolation is presented here to solve the problem of insufficient data, and can increase the number of samples significantly by synthesizing action sequences. Besides, we concatenate a Connectionist Temporal Classification (CTC) layer with a network designed for skeleton-based TAS to obtain an optimized model. Leveraging CTC can enhance the temporal alignment between prediction and ground truth and further improve the segment-wise metrics of segmentation results. Extensive experiments on both public and self-constructed datasets, including two small-scale datasets and one large-scale dataset, show the effectiveness of two proposed methods in improving the performance of the few-shot skeleton-based TAS task.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90475265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection 重新审视视频异常检测的自监督多任务学习
Comput. Vis. Image Underst. Pub Date : 2022-07-16 DOI: 10.48550/arXiv.2207.08003
Antonio Bărbălău, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, J. Dueholm, B. Ramachandra, Kamal Nasrollahi, F. Khan, T. Moeslund, M. Shah
{"title":"SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection","authors":"Antonio Bărbălău, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, J. Dueholm, B. Ramachandra, Kamal Nasrollahi, F. Khan, T. Moeslund, M. Shah","doi":"10.48550/arXiv.2207.08003","DOIUrl":"https://doi.org/10.48550/arXiv.2207.08003","url":null,"abstract":"A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79033053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification SiaTrans:基于深度图像分类的RGB-D显著目标检测Siamese变压器网络
Comput. Vis. Image Underst. Pub Date : 2022-07-09 DOI: 10.48550/arXiv.2207.04224
Xin Jia, Changlei Dongye, Yan-Tsung Peng
{"title":"SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification","authors":"Xin Jia, Changlei Dongye, Yan-Tsung Peng","doi":"10.48550/arXiv.2207.04224","DOIUrl":"https://doi.org/10.48550/arXiv.2207.04224","url":null,"abstract":"RGB-D SOD uses depth information to handle challenging scenes and obtain high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection methods overwhelmingly rely on the strategy of directly fusing depth information. Although these methods improve the accuracy of saliency prediction through various cross-modality fusion strategies, misinformation provided by some poor-quality depth images can affect the saliency prediction result. To address this issue, a novel RGB-D salient object detection model (SiaTrans) is proposed in this paper, which allows training on depth image quality classification at the same time as training on SOD. In light of the common information between RGB and depth images on salient objects, SiaTrans uses a Siamese transformer network with shared weight parameters as the encoder and extracts RGB and depth features concatenated on the batch dimension, saving space resources without compromising performance. SiaTrans uses the Class token in the backbone network (T2T-ViT) to classify the quality of depth images without preventing the token sequence from going on with the saliency detection task. Transformer-based cross-modality fusion module (CMF) can effectively fuse RGB and depth information. And in the testing process, CMF can choose to fuse cross-modality information or enhance RGB information according to the quality classification signal of the depth image. The greatest benefit of our designed CMF and decoder is that they maintain the consistency of RGB and RGB-D information decoding: SiaTrans decodes RGB-D or RGB information under the same model parameters according to the classification signal during testing. Comprehensive experiments on nine RGB-D SOD benchmark datasets show that SiaTrans has the best overall performance and the least computation compared with recent state-of-the-art methods.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76098488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning PoseGU:基于新型人体姿态生成器和无偏学习的三维人体姿态估计
Comput. Vis. Image Underst. Pub Date : 2022-07-07 DOI: 10.48550/arXiv.2207.03618
S. Guan, Haiyan Lu, Linchao Zhu, Gengfa Fang
{"title":"PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning","authors":"S. Guan, Haiyan Lu, Linchao Zhu, Gengfa Fang","doi":"10.48550/arXiv.2207.03618","DOIUrl":"https://doi.org/10.48550/arXiv.2207.03618","url":null,"abstract":"3D pose estimation has recently gained substantial interests in computer vision domain. Existing 3D pose estimation methods have a strong reliance on large size well-annotated 3D pose datasets, and they suffer poor model generalization on unseen poses due to limited diversity of 3D poses in training sets. In this work, we propose PoseGU, a novel human pose generator that generates diverse poses with access only to a small size of seed samples, while equipping the Counterfactual Risk Minimization to pursue an unbiased evaluation objective. Extensive experiments demonstrate PoseGU outforms almost all the state-of-the-art 3D human pose methods under consideration over three popular benchmark datasets. Empirical analysis also proves PoseGU generates 3D poses with improved data diversity and better generalization ability.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85834408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Rich global feature guided network for monocular depth estimation 单目深度估计的富全局特征引导网络
Comput. Vis. Image Underst. Pub Date : 2022-07-01 DOI: 10.2139/ssrn.4057946
Bingyuan Wu, Yongxiong Wang
{"title":"Rich global feature guided network for monocular depth estimation","authors":"Bingyuan Wu, Yongxiong Wang","doi":"10.2139/ssrn.4057946","DOIUrl":"https://doi.org/10.2139/ssrn.4057946","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84662054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CERVI: collaborative editing of raster and vector images CERVI:协同编辑光栅和矢量图像
Comput. Vis. Image Underst. Pub Date : 2022-06-23 DOI: 10.1007/s00371-022-02522-1
Ulrike Bath, Sumit Shekhar, Julian Egbert, Julian Schmidt, Amir Semmo, J. Döllner, Matthias Trapp
{"title":"CERVI: collaborative editing of raster and vector images","authors":"Ulrike Bath, Sumit Shekhar, Julian Egbert, Julian Schmidt, Amir Semmo, J. Döllner, Matthias Trapp","doi":"10.1007/s00371-022-02522-1","DOIUrl":"https://doi.org/10.1007/s00371-022-02522-1","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88231839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Neural network adaption for depth sensor replication 深度传感器复制的神经网络自适应
Comput. Vis. Image Underst. Pub Date : 2022-06-23 DOI: 10.1007/s00371-022-02531-0
Christian Kunert, Tobias Schwandt, Christon-Ragavan Nadar, W. Broll
{"title":"Neural network adaption for depth sensor replication","authors":"Christian Kunert, Tobias Schwandt, Christon-Ragavan Nadar, W. Broll","doi":"10.1007/s00371-022-02531-0","DOIUrl":"https://doi.org/10.1007/s00371-022-02531-0","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80649268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Physically-admissible polarimetric data augmentation for road-scene analysis 物理上允许的偏振数据增强道路场景分析
Comput. Vis. Image Underst. Pub Date : 2022-06-01 DOI: 10.48550/arXiv.2206.07431
Cyprien Ruffino, Rachel Blin, Samia Ainouz, G. Gasso, Romain H'erault, F. Mériaudeau, S. Canu
{"title":"Physically-admissible polarimetric data augmentation for road-scene analysis","authors":"Cyprien Ruffino, Rachel Blin, Samia Ainouz, G. Gasso, Romain H'erault, F. Mériaudeau, S. Canu","doi":"10.48550/arXiv.2206.07431","DOIUrl":"https://doi.org/10.48550/arXiv.2206.07431","url":null,"abstract":"Polarimetric imaging, along with deep learning, has shown improved performances on different tasks including scene analysis. However, its robustness may be questioned because of the small size of the training datasets. Though the issue could be solved by data augmentation, polarization modalities are subject to physical feasibility constraints unaddressed by classical data augmentation techniques. To address this issue, we propose to use CycleGAN, an image translation technique based on deep generative models that solely relies on unpaired data, to transfer large labeled road scene datasets to the polarimetric domain. We design several auxiliary loss terms that, alongside the CycleGAN losses, deal with the physical constraints of polarimetric images. The efficiency of this solution is demonstrated on road scene object detection tasks where generated realistic polarimetric images allow to improve performances on cars and pedestrian detection up to 9%. The resulting constrained CycleGAN is publicly released, allowing anyone to generate their own polarimetric images.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85407885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate and efficient salient object detection via position prior attention 通过位置优先注意准确有效地检测显著目标
Comput. Vis. Image Underst. Pub Date : 2022-06-01 DOI: 10.2139/ssrn.4081836
Jin Zhang, Qiuwei Liang, Yanjiao Shi
{"title":"Accurate and efficient salient object detection via position prior attention","authors":"Jin Zhang, Qiuwei Liang, Yanjiao Shi","doi":"10.2139/ssrn.4081836","DOIUrl":"https://doi.org/10.2139/ssrn.4081836","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80701212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信