{"title":"SF2SE3: Clustering Scene Flow into SE(3)-Motions via Proposal and Selection","authors":"Leonhard Sommer, Philipp Schröppel, T. Brox","doi":"10.1007/978-3-031-16788-1_14","DOIUrl":"https://doi.org/10.1007/978-3-031-16788-1_14","url":null,"abstract":"","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126270875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diverse Video Captioning by Adaptive Spatio-temporal Attention","authors":"Zohreh Ghaderi, Leonard Salewski, H. Lensch","doi":"10.48550/arXiv.2208.09266","DOIUrl":"https://doi.org/10.48550/arXiv.2208.09266","url":null,"abstract":"To generate proper captions for videos, the inference needs to identify relevant concepts and pay attention to the spatial relationships between them as well as to the temporal development in the clip. Our end-to-end encoder-decoder video captioning framework incorporates two transformer-based architectures, an adapted transformer for a single joint spatio-temporal video analysis as well as a self-attention-based decoder for advanced text generation. Furthermore, we introduce an adaptive frame selection scheme to reduce the number of required incoming frames while maintaining the relevant content when training both transformers. Additionally, we estimate semantic concepts relevant for video captioning by aggregating all ground truth captions of each sample. Our approach achieves state-of-the-art results on the MSVD, as well as on the large-scale MSR-VTT and the VATEX benchmark datasets considering multiple Natural Language Generation (NLG) metrics. Additional evaluations on diversity scores highlight the expressiveness and diversity in the structure of our generated captions.","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114777663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global Hierarchical Attention for 3D Point Cloud Analysis","authors":"Dan Jia, Alexander Hermans, Bastian Leibe","doi":"10.48550/arXiv.2208.03791","DOIUrl":"https://doi.org/10.48550/arXiv.2208.03791","url":null,"abstract":". We propose a new attention mechanism, called Global Hierarchical Attention (GHA), for 3D point cloud analysis. GHA approximates the regular global dot-product attention via a series of coarsening and interpolation operations over multiple hierarchy levels. The advantage of GHA is two-fold. First, it has linear complexity with respect to the number of points, enabling the processing of large point clouds. Second, GHA inherently possesses the inductive bias to focus on spatially close points, while retaining the global connectivity among all points. Combined with a feedforward network, GHA can be inserted into many existing network architectures. We experiment with multiple baseline networks and show that adding GHA consistently improves performance across different tasks and datasets. For the task of semantic segmentation, GHA gives a +1.7% mIoU increase to the MinkowskiEngine baseline on ScanNet. For the 3D object detection task, GHA improves the CenterPoint baseline by +0.5% mAP on the nuScenes dataset, and the 3DETR baseline by +2.1% mAP 25 and +1.5% mAP 50 on ScanNet.","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127968022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne
{"title":"Augmentation Learning for Semi-Supervised Classification","authors":"Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne","doi":"10.48550/arXiv.2208.01956","DOIUrl":"https://doi.org/10.48550/arXiv.2208.01956","url":null,"abstract":"Recently, a number of new Semi-Supervised Learning methods have emerged. As the accuracy for ImageNet and similar datasets increased over time, the performance on tasks beyond the classification of natural images is yet to be explored. Most Semi-Supervised Learning methods rely on a carefully manually designed data augmentation pipeline that is not transferable for learning on images of other domains. In this work, we propose a Semi-Supervised Learning method that automatically selects the most effective data augmentation policy for a particular dataset. We build upon the Fixmatch method and extend it with meta-learning of augmentations. The augmentation is learned in additional training before the classification training and makes use of bi-level optimization, to optimize the augmentation policy and maximize accuracy. We evaluate our approach on two domain-specific datasets, containing satellite images and hand-drawn sketches, and obtain state-of-the-art results. We further investigate in an ablation the different parameters relevant for learning augmentation policies and show how policy learning can be used to adapt augmentations to datasets beyond ImageNet.","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124952434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ArtFID: Quantitative Evaluation of Neural Style Transfer","authors":"Matthias Wright, B. Ommer","doi":"10.48550/arXiv.2207.12280","DOIUrl":"https://doi.org/10.48550/arXiv.2207.12280","url":null,"abstract":"The field of neural style transfer has experienced a surge of research exploring different avenues ranging from optimization-based approaches and feed-forward models to meta-learning methods. The developed techniques have not just progressed the field of style transfer, but also led to breakthroughs in other areas of computer vision, such as all of visual synthesis. However, whereas quantitative evaluation and benchmarking have become pillars of computer vision research, the reproducible, quantitative assessment of style transfer models is still lacking. Even in comparison to other fields of visual synthesis, where widely used metrics exist, the quantitative evaluation of style transfer is still lagging behind. To support the automatic comparison of different style transfer approaches and to study their respective strengths and weaknesses, the field would greatly benefit from a quantitative measurement of stylization performance. Therefore, we propose a method to complement the currently mostly qualitative evaluation schemes. We provide extensive evaluations and a large-scale user study to show that the proposed metric strongly coincides with human judgment.","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125289525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dominik Drees, Florian Eilers, Ang Bian, Xiaoyi Jiang
{"title":"A Bhattacharyya Coefficient-Based Framework for Noise Model-Aware Random Walker Image Segmentation","authors":"Dominik Drees, Florian Eilers, Ang Bian, Xiaoyi Jiang","doi":"10.48550/arXiv.2206.00947","DOIUrl":"https://doi.org/10.48550/arXiv.2206.00947","url":null,"abstract":". One well established method of interactive image segmentation is the random walker algorithm. Considerable research on this family of segmentation methods has been continuously conducted in recent years with numerous applications. These methods are common in using a simple Gaussian weight function which depends on a parameter that strongly influences the segmentation performance. In this work we propose a general framework of deriving weight functions based on probabilistic modeling. This framework can be concretized to cope with virtually any well-defined noise model. It eliminates the critical parameter and thus avoids time-consuming parameter search. We derive the specific weight functions for common noise types and show their superior performance on synthetic data as well as different biomedical image data (MRI images from the NYU fastMRI dataset, larvae images acquired with the FIM technique). Our framework can also be used in multiple other applications, e.g., the graph cut algorithm and its extensions.","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124747065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Localized Vision-Language Matching for Open-vocabulary Object Detection","authors":"M. A. Bravo, Sudhanshu Mittal, T. Brox","doi":"10.48550/arXiv.2205.06160","DOIUrl":"https://doi.org/10.48550/arXiv.2205.06160","url":null,"abstract":"In this work, we propose an open-vocabulary object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the model for the object detection task using known class annotations. We show that a simple language model fits better than a large contextualized language model for detecting novel objects. Moreover, we introduce a consistency-regularization technique to better exploit image-caption pair information. Our method compares favorably to existing open-vocabulary detection approaches while being data-efficient. Source code is available at https://github.com/lmb-freiburg/locov .","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126194403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Ragnarsdóttir, Laura Manduchi, H. Michel, F. Laumer, S. Wellmann, Ece Ozkan, Julia-Franziska Vogt
{"title":"Interpretable Prediction of Pulmonary Hypertension in Newborns using Echocardiograms","authors":"H. Ragnarsdóttir, Laura Manduchi, H. Michel, F. Laumer, S. Wellmann, Ece Ozkan, Julia-Franziska Vogt","doi":"10.48550/arXiv.2203.13038","DOIUrl":"https://doi.org/10.48550/arXiv.2203.13038","url":null,"abstract":". Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Therefore, ac-curate and early detection of PH is crucial for successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, rais-ing the need for an automated approach. In this work, we present an interpretable multi-view video-based deep learning approach to predict PH for a cohort of 194 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice.","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128390213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steffen Jung, Sebastian Ziegler, Amirhossein Kardoost, M. Keuper
{"title":"Optimizing Edge Detection for Image Segmentation with Multicut Penalties","authors":"Steffen Jung, Sebastian Ziegler, Amirhossein Kardoost, M. Keuper","doi":"10.1007/978-3-031-16788-1_12","DOIUrl":"https://doi.org/10.1007/978-3-031-16788-1_12","url":null,"abstract":"","PeriodicalId":221267,"journal":{"name":"German Conference on Pattern Recognition","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133305334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}