Comput. Vis. Image Underst.最新文献

筛选
英文 中文
Fine-grained bidirectional attentional generation and knowledge-assisted networks for cross-modal retrieval 跨模态检索的细粒度双向注意生成和知识辅助网络
Comput. Vis. Image Underst. Pub Date : 2022-06-01 DOI: 10.2139/ssrn.4072473
Jianwei Zhu, Zhixin Li, Jiahui Wei, Yukun Zeng, Huifang Ma
{"title":"Fine-grained bidirectional attentional generation and knowledge-assisted networks for cross-modal retrieval","authors":"Jianwei Zhu, Zhixin Li, Jiahui Wei, Yukun Zeng, Huifang Ma","doi":"10.2139/ssrn.4072473","DOIUrl":"https://doi.org/10.2139/ssrn.4072473","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75941441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improved real-time three-dimensional stereo matching with local consistency 改进实时三维立体匹配与局部一致性
Comput. Vis. Image Underst. Pub Date : 2022-06-01 DOI: 10.2139/ssrn.4085389
Xiaoqian Ye, B. Yan, Boyang Liu, Huachun Wang, Shuai Qi, Duo Chen, Peng Wang, Kuiru Wang, X. Sang
{"title":"Improved real-time three-dimensional stereo matching with local consistency","authors":"Xiaoqian Ye, B. Yan, Boyang Liu, Huachun Wang, Shuai Qi, Duo Chen, Peng Wang, Kuiru Wang, X. Sang","doi":"10.2139/ssrn.4085389","DOIUrl":"https://doi.org/10.2139/ssrn.4085389","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89047922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reinforced Pedestrian Attribute Recognition with Group Optimization Reward 基于群体优化奖励的行人属性识别
Comput. Vis. Image Underst. Pub Date : 2022-05-21 DOI: 10.48550/arXiv.2205.14042
Zhong Ji, Zhenfei Hu, Yaodong Wang, Shengjia Li
{"title":"Reinforced Pedestrian Attribute Recognition with Group Optimization Reward","authors":"Zhong Ji, Zhenfei Hu, Yaodong Wang, Shengjia Li","doi":"10.48550/arXiv.2205.14042","DOIUrl":"https://doi.org/10.48550/arXiv.2205.14042","url":null,"abstract":"Pedestrian Attribute Recognition (PAR) is a challenging task in intelligent video surveillance. Two key challenges in PAR include complex alignment relations between images and attributes, and imbalanced data distribution. Existing approaches usually formulate PAR as a recognition task. Different from them, this paper addresses it as a decision-making task via a reinforcement learning framework. Specifically, PAR is formulated as a Markov decision process (MDP) by designing ingenious states, action space, reward function and state transition. To alleviate the inter-attribute imbalance problem, we apply an Attribute Grouping Strategy (AGS) by dividing all attributes into subgroups according to their region and category information. Then we employ an agent to recognize each group of attributes, which is trained with Deep Q-learning algorithm. We also propose a Group Optimization Reward (GOR) function to alleviate the intra-attribute imbalance problem. Experimental results on the three benchmark datasets of PETA, RAP and PA100K illustrate the effectiveness and competitiveness of the proposed approach and demonstrate that the application of reinforcement learning to PAR is a valuable research direction.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82958034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast, accurate and robust registration of multiple depth sensors without need for RGB and IR images 快速,准确和鲁棒的多个深度传感器配准,无需RGB和IR图像
Comput. Vis. Image Underst. Pub Date : 2022-05-17 DOI: 10.1007/s00371-022-02505-2
Andre Mühlenbrock, Roland Fischer, Christoph Schröder‐Dering, René Weller, G. Zachmann
{"title":"Fast, accurate and robust registration of multiple depth sensors without need for RGB and IR images","authors":"Andre Mühlenbrock, Roland Fischer, Christoph Schröder‐Dering, René Weller, G. Zachmann","doi":"10.1007/s00371-022-02505-2","DOIUrl":"https://doi.org/10.1007/s00371-022-02505-2","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72984378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Continual learning on 3D point clouds with random compressed rehearsal 持续学习三维点云与随机压缩排练
Comput. Vis. Image Underst. Pub Date : 2022-05-16 DOI: 10.48550/arXiv.2205.08013
M. Zamorski, Michal Stypulkowski, Konrad Karanowski, Tomasz Trzci'nski, Maciej Ziȩba
{"title":"Continual learning on 3D point clouds with random compressed rehearsal","authors":"M. Zamorski, Michal Stypulkowski, Konrad Karanowski, Tomasz Trzci'nski, Maciej Ziȩba","doi":"10.48550/arXiv.2205.08013","DOIUrl":"https://doi.org/10.48550/arXiv.2205.08013","url":null,"abstract":"Contemporary deep neural networks offer state-of-the-art results when applied to visual reasoning, e.g., in the context of 3D point cloud data. Point clouds are important datatype for precise modeling of three-dimensional environments, but effective processing of this type of data proves to be challenging. In the world of large, heavily-parameterized network architectures and continuously-streamed data, there is an increasing need for machine learning models that can be trained on additional data. Unfortunately, currently available models cannot fully leverage training on additional data without losing their past knowledge. Combating this phenomenon, called catastrophic forgetting, is one of the main objectives of continual learning. Continual learning for deep neural networks has been an active field of research, primarily in 2D computer vision, natural language processing, reinforcement learning, and robotics. However, in 3D computer vision, there are hardly any continual learning solutions specifically designed to take advantage of point cloud structure. This work proposes a novel neural network architecture capable of continual learning on 3D point cloud data. We utilize point cloud structure properties for preserving a heavily compressed set of past data. By using rehearsal and reconstruction as regularization methods of the learning process, our approach achieves a significant decrease of catastrophic forgetting compared to the existing solutions on several most popular point cloud datasets considering two continual learning settings: when a task is known beforehand, and in the challenging scenario of when task information is unknown to the model.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76712660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection 基本:用于时间动作检测的令人震惊的RGB-Only基线
Comput. Vis. Image Underst. Pub Date : 2022-05-05 DOI: 10.48550/arXiv.2205.02717
Mingdong Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, Limin Wang
{"title":"BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection","authors":"Mingdong Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, Limin Wang","doi":"10.48550/arXiv.2205.02717","DOIUrl":"https://doi.org/10.48550/arXiv.2205.02717","url":null,"abstract":"Temporal action detection (TAD) is extensively studied in the video understanding community by generally following the object detection pipeline in images. However, complex designs are not uncommon in TAD, such as two-stream feature extraction, multi-stage training, complex temporal modeling, and global context fusion. In this paper, we do not aim to introduce any novel technique for TAD. Instead, we study a simple, straightforward, yet must-known baseline given the current status of complex design and low detection efficiency in TAD. In our simple baseline (termed BasicTAD), we decompose the TAD pipeline into several essential components: data sampling, backbone design, neck construction, and detection head. We extensively investigate the existing techniques in each component for this baseline, and more importantly, perform end-to-end training over the entire pipeline thanks to the simplicity of design. As a result, this simple BasicTAD yields an astounding and real-time RGB-Only baseline very close to the state-of-the-art methods with two-stream inputs. In addition, we further improve the BasicTAD by preserving more temporal and spatial information in network representation (termed as PlusTAD). Empirical results demonstrate that our PlusTAD is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction. Meanwhile, we also perform in-depth visualization and error analysis on our proposed method and try to provide more insights on the TAD problem. Our approach can serve as a strong baseline for future TAD research. The code and model will be released at https://github.com/MCG-NJU/BasicTAD.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74452018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Semantically Accurate Super-Resolution Generative Adversarial Networks 语义准确的超分辨率生成对抗网络
Comput. Vis. Image Underst. Pub Date : 2022-05-01 DOI: 10.48550/arXiv.2205.08659
Tristan Frizza, D. Dansereau, Nagita Mehr Seresht, M. Bewley
{"title":"Semantically Accurate Super-Resolution Generative Adversarial Networks","authors":"Tristan Frizza, D. Dansereau, Nagita Mehr Seresht, M. Bewley","doi":"10.48550/arXiv.2205.08659","DOIUrl":"https://doi.org/10.48550/arXiv.2205.08659","url":null,"abstract":"This work addresses the problems of semantic segmentation and image super-resolution by jointly considering the performance of both in training a Generative Adversarial Network (GAN). We propose a novel architecture and domain-specific feature loss, allowing super-resolution to operate as a pre-processing step to increase the performance of downstream computer vision tasks, specifically semantic segmentation. We demonstrate this approach using Nearmap’s aerial imagery dataset which covers hundreds of urban areas at 5-7 cm per pixel resolution. We show the proposed approach improves perceived image quality as well as quantitative segmentation accuracy across all prediction classes, yielding an average accuracy improvement of 11.8% and 108% at 4 × and 32 × super-resolution, compared with state-of-the art single-network methods. This work demonstrates that jointly considering image-based and task-specific losses can improve the performance of both, and advances the state-of-the-art in semantic-aware super-resolution of aerial imagery. 1: A comparison of of three potential generator model architec- tures for 4 × super-resolution. We chose RRDN for all subsequent ex-periments due to its superior overall performance on pixel-wise loss","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74313737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Real-time semantic segmentation with local spatial pixel adjustment 基于局部空间像素调整的实时语义分割
Comput. Vis. Image Underst. Pub Date : 2022-05-01 DOI: 10.2139/ssrn.4053470
Cunjun Xiao, Xingjun Hao, Haibin Li, Yaqian Li, Wengming Zhang
{"title":"Real-time semantic segmentation with local spatial pixel adjustment","authors":"Cunjun Xiao, Xingjun Hao, Haibin Li, Yaqian Li, Wengming Zhang","doi":"10.2139/ssrn.4053470","DOIUrl":"https://doi.org/10.2139/ssrn.4053470","url":null,"abstract":"","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75904430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
End-to-end weakly-supervised single-stage multiple 3D hand mesh reconstruction from a single RGB image 端到端弱监督单阶段多三维手工网格重建从单个RGB图像
Comput. Vis. Image Underst. Pub Date : 2022-04-18 DOI: 10.2139/ssrn.4199294
Jinwei Ren, Jianke Zhu, Jialiang Zhang
{"title":"End-to-end weakly-supervised single-stage multiple 3D hand mesh reconstruction from a single RGB image","authors":"Jinwei Ren, Jianke Zhu, Jialiang Zhang","doi":"10.2139/ssrn.4199294","DOIUrl":"https://doi.org/10.2139/ssrn.4199294","url":null,"abstract":"In this paper, we consider the challenging task of simultaneously locating and recovering multiple hands from a single 2D image. Previous studies either focus on single hand reconstruction or solve this problem in a multi-stage way. Moreover, the conventional two-stage pipeline firstly detects hand areas, and then estimates 3D hand pose from each cropped patch. To reduce the computational redundancy in preprocessing and feature extraction, for the first time, we propose a concise but efficient single-stage pipeline for multi-hand reconstruction. Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture, respectively. Besides, we adopt a weakly-supervised scheme to alleviate the burden of expensive 3D real-world data annotations. To this end, we propose a series of losses optimized by a stage-wise training scheme, where a multi-hand dataset with 2D annotations is generated based on the publicly available single hand datasets. In order to further improve the accuracy of the weakly supervised model, we adopt several feature consistency constraints in both single and multiple hand settings. Specifically, the keypoints of each hand estimated from local features should be consistent with the re-projected points predicted from global features. Extensive experiments on public benchmarks including FreiHAND, HO3D, InterHand2.6M and RHD demonstrate that our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners. The code and models are available at {https://github.com/zijinxuxu/SMHR}.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73698711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Video Captioning: a comparative review of where we are and which could be the route 视频字幕:对我们所处的位置和可能的路线进行比较回顾
Comput. Vis. Image Underst. Pub Date : 2022-04-12 DOI: 10.48550/arXiv.2204.05976
Daniela Moctezuma, Tania Ram'irez-delReal, Guillermo Ruiz, Oth'on Gonz'alez-Ch'avez
{"title":"Video Captioning: a comparative review of where we are and which could be the route","authors":"Daniela Moctezuma, Tania Ram'irez-delReal, Guillermo Ruiz, Oth'on Gonz'alez-Ch'avez","doi":"10.48550/arXiv.2204.05976","DOIUrl":"https://doi.org/10.48550/arXiv.2204.05976","url":null,"abstract":"Video captioning is the process of describing the content of a sequence of images capturing its semantic relationships and meanings. Dealing with this task with a single image is arduous, not to mention how difficult it is for a video (or images sequence). The amount and relevance of the applications of video captioning are vast, mainly to deal with a significant amount of video recordings in video surveillance, or assisting people visually impaired, to mention a few. To analyze where the efforts of our community to solve the video captioning task are, as well as what route could be better to follow, this manuscript presents an extensive review of more than 105 papers for the period of 2016 to 2021. As a result, the most-used datasets and metrics are identified. Also, the main approaches used and the best ones. We compute a set of rankings based on several performance metrics to obtain, according to its performance, the best method with the best result on the video captioning task. Finally, some insights are concluded about which could be the next steps or opportunity areas to improve dealing with this complex task.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86286089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信