{"title":"Fast Spatial-Temporal Transformer Network","authors":"R. Escher, Rodrigo Andrade de Bem, P. L. J. Drews","doi":"10.1109/sibgrapi54419.2021.00018","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00018","url":null,"abstract":"In computer vision, the restoration of missing regions in an image can be tackled with image inpainting techniques. Neural networks that perform inpainting in videos require the extraction of information from neighboring frames to obtain a temporally coherent result. The state-of-the-art methods for video inpainting are mainly based on Transformer Networks, which rely on attention mechanisms to handle temporal input data. However, such networks are highly costly, requiring considerable computational power for training and testing, which hinders its use on modest computing platforms. In this context, our goal is to reduce the computational complexity of state-of-the-art video inpainting methods, improving performance and facilitating its use in low-end GPUs. Therefore, we introduce the Fast Spatio-Temporal Transformer Network (FastSTTN), an extension of the Spatio-Temporal Transformer Network (STTN) in which the adoption of Reversible Layers reduces memory usage up to 7 times and execution time by approximately 2.2 times, while maintaining state-of-the-art video inpainting accuracy.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fernando Pereira dos Santos, Gabriela Thume, M. Ponti
{"title":"Data Augmentation Guidelines for Cross-Dataset Transfer Learning and Pseudo Labeling","authors":"Fernando Pereira dos Santos, Gabriela Thume, M. Ponti","doi":"10.1109/sibgrapi54419.2021.00036","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00036","url":null,"abstract":"Convolutional Neural Networks require large amounts of labeled data in order to be trained. To improve such performances, a practical approach widely used is to augment the training set data, generating compatible data. Standard data augmentation for images includes conventional techniques, such as rotation, shift, and flip. In this paper, we go beyond such methods by studying alternative augmentation procedures for cross-dataset scenarios, in which a source dataset is used for training and a target dataset is used for testing. Through an extensive analysis considering different paradigms, saturation, and combination procedures, we provide guidelines for using augmentation methods in favor of transfer learning scenarios. As a novel approach for self-supervised learning, we also propose data augmentation techniques as pseudo labels during training. Our techniques demonstrate themselves as robust alternatives for different domains of transfer learning, including benefiting scenarios for self-supervised learning.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125369367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hugo Oliveira, L. Penteado, Jose Luiz Maciel, S. Ferraciolli, M. Takahashi, I. Bloch, R. M. C. Junior
{"title":"Automatic Segmentation of Posterior Fossa Structures in Pediatric Brain MRIs","authors":"Hugo Oliveira, L. Penteado, Jose Luiz Maciel, S. Ferraciolli, M. Takahashi, I. Bloch, R. M. C. Junior","doi":"10.1109/sibgrapi54419.2021.00025","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00025","url":null,"abstract":"Pediatric brain MRI is a useful tool in assessing the healthy cerebral development of children. Since many pathologies may manifest in the brainstem and cerebellum, the objective of this study was to have an automated segmentation of pediatric posterior fossa structures. These pathologies include a myriad of etiologies from congenital malformations to tumors, which are very prevalent in this age group. We propose a pediatric brain MRI segmentation pipeline composed of preprocessing, semantic segmentation and post-processing steps. Segmentation modules are composed of two ensembles of networks: generalists and specialists. The generalist networks are responsible for locating and roughly segmenting the brain areas, yielding regions of interest for each target organ. Specialist networks can then improve the segmentation performance for underrepresented organs by learning only from the regions of interest from the generalist networks. At last, post-processing consists in merging the specialist and generalist networks predictions, and performing late fusion across the distinct architectures to generate a final prediction. We conduct a thorough ablation analysis on this pipeline and assess the superiority of the methodology in segmenting the brain stem, 4th ventricle and cerebellum. The proposed methodology achieved a macro-averaged Dice index of 0.855 with respect to manual segmentation, with only 32 labeled volumes used during training. Additionally, average distances between automatically and manually segmented surfaces remained around 1mm for the three structures, while volumetry results revealed high agreement between manually labeled and predicted regions.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126615404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combination of Optical Character Recognition Engines for Documents Containing Sparse Text and Alphanumeric Codes","authors":"Iago Correa, P. L. J. Drews, R. Rodrigues","doi":"10.1109/sibgrapi54419.2021.00048","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00048","url":null,"abstract":"Many companies that buy machines, parts, or tools retain documents such as notes, receipts, forms, or instruction manuals over the years, and they may find themselves in need of digitizing these accumulated documents. Thus, when using optical character recognition (OCR) systems in these documents, it is possible to note that these systems can present two main difficulties. The first is to locate the sparse text in a noncontinuous way, and the second is to match words that are closer to codes and less to words in human language. Although there are many works in the literature about sparse texts, such as forms and tables, there is usually not much concern about the issue with codes in which one can not rely on dictionaries or even both problems together. Therefore, to correct this issue without having to search for extensive databases or conduct training and development of new models, this work proposed to take advantage of pre-trained models of OCR such as from the Tesseract engine or the Google Cloud’s Vision API. In order to do so, we proposed the exploration of combination strategies, including a new one based on median string. The experimental results achieved up to 3.09% improvement in character accuracy and 1.16% in word accuracy in comparison to the best individual performances from the engines when our method based on string combination was adopted.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121460664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A System for Visual Analysis of Objects Behavior in Surveillance Videos","authors":"Cibele Mara Fonseca, J. G. Paiva","doi":"10.1109/sibgrapi54419.2021.00032","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00032","url":null,"abstract":"Closed-circuit television (CCTV) surveillance systems are employed in different scenarios to prevent a variety of threats, producing a large volume of video footage. Several surveillance tasks consist of detecting/tracking moving objects in the scene to analyze their behavior and comprehend their role in events that occur in the video. Such analysis is unfeasible if manually performed, due to the large volume of long duration videos, as well as due to intrinsic human limitations, which may compromise the perception of multiple strategic events. Most of smart surveillance approaches designed for moving objects analysis focus only on the detection/tracking process, providing a limited comprehension of objects behavior, and rely on automatic procedures with no/few user interaction, which may hamper the comprehension of the produced results. Visual analytics techniques may be useful to highlight behavior patterns, improving the comprehension of how the objects contribute to the occurrence of observed events in the video. In this work, we propose a video surveillance visual analysis system for identification/ exploration of objects behavior and their relationship with events occurrence. We introduce the Appearance Bars layout to perform a temporal analysis of each object presence in the scene, highlighting the involved dynamics and spatial distribution, as well as its interaction with other objects. Coordinated with other support layouts, these bars represent multiple aspects of the objects behavior during video extent. We demonstrate the utility of our system in surveillance scenarios that shows different aspects of objects behavior, which we relate to events that occur in the videos.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116014798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Belém, B. Perret, J. Cousty, S. Guimarães, A. Falcão
{"title":"Towards a Simple and Efficient Object-based Superpixel Delineation Framework","authors":"F. Belém, B. Perret, J. Cousty, S. Guimarães, A. Falcão","doi":"10.1109/sibgrapi54419.2021.00054","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00054","url":null,"abstract":"Superpixel segmentation methods are widely used in computer vision applications due to their properties in border delineation. These methods do not usually take into account any prior object information. Although there are a few exceptions, such methods significantly rely on the quality of the object information provided and present high computational cost in most practical cases. Inspired by such approaches, we propose Object-based Dynamic and Iterative Spanning Forest (ODISF), a novel object-based superpixel segmentation framework to effectively exploit prior object information while being robust to the quality of that information. ODISF consists of three independent steps: (i) seed oversampling; (ii) dynamic path-based superpixel generation; and (iii) object-based seed removal. After (i), steps (ii) and (iii) are repeated until the desired number of superpixels is finally reached. Experimental results show that ODISF can surpass state-of-the-art methods according to several metrics, while being significantly faster than its object-based counterparts.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131683651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing the need for bounding box annotations in Object Detection using Image Classification data","authors":"Leonardo Blanger, N. Hirata, Xiaoyi Jiang","doi":"10.1109/sibgrapi54419.2021.00035","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00035","url":null,"abstract":"We address the problem of training Object Detection models using significantly less bounding box annotated images. For that, we take advantage of cheaper and more abundant image classification data. Our proposal consists in automatically generating artificial detection samples, with no need of expensive detection level supervision, using images with classification labels only. We also detail a pretraining initialization strategy for detection architectures using these artificially synthesized samples, before finetuning on real detection data, and experimentally show how this consistently leads to more data efficient models. With the proposed approach, we were able to effectively use only classification data to improve results on the harder and more supervision hungry object detection problem. We achieve results equivalent to those of the full data scenario using only a small fraction of the original detection data for Face, Bird, and Car detection.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130835569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Israel N. Chaparro-Cruz, Javier A. Montoya-Zegarra
{"title":"BORDE: Boundary and Sub-Region Denormalization for Semantic Brain Image Synthesis","authors":"Israel N. Chaparro-Cruz, Javier A. Montoya-Zegarra","doi":"10.1109/sibgrapi54419.2021.00020","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00020","url":null,"abstract":"Medical images are often expensive to acquire and offer limited use due to legal issues besides the lack of consistency and availability of image annotations. Thus, the use of medical datasets can be restrictive for training deep learning models. The generation of synthetic images along with their corresponding annotations can therefore aid to solve this issue. In this paper, we propose a novel Generative Adversarial Network (GAN) generator for multimodal semantic image synthesis of brain images based on a novel denormalization block named BOundary and sub-Region DEnormalization (BORDE). The new architecture consists of a decoder generator that allows: (i) an effectively sequential propagation of a-priori semantic information through the generator, (ii) noise injection at different scales to avoid mode-collapse, and (iii) the generation of rich and diverse multimodal synthetic samples along with their contours. Our model generates very realistic and plausible synthetic images that when combined with real data helps to improve the accuracy in brain segmentation tasks. Quantitative and qualitative results on challenging multimodal brain imaging datasets (BraTS 2020 [1] and ISLES 2018 [2]) demonstrate the advantages of our model over existing image-agnostic state-of-the-art techniques, improving segmentation and semantic image synthesis tasks. This allows us to prove the need for more domain-specific techniques in GANs models.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116342598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-local medians filter for joint Gaussian and impulsive image denoising","authors":"A. Levada","doi":"10.1109/sibgrapi54419.2021.00029","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00029","url":null,"abstract":"Image denoising concerns with the development of filters to remove or attenuate random perturbations in the observed data, but at the same time, preserving most of edges and fine details in the scene. One problem with joint additive Gaussian and impulsive noise degradation is that they are spread over all frequencies of the signal. Hence, the most effective filters for this kind of noise are implemented in the spatial domain. In this paper, we proposed a Non-Local Medians filter that combine the medians of every patch of a search window using two distinct similarity measures: the Euclidean distance and the Kullback-Leibler divergence between Gaussian densities estimated from the patches. Computational experiments with 25 images corrupted by joint Gaussian and impulsive noises show that the proposed method is capable of producing, on average, significant higher PSNR and SSIM than the combination of the median filter and the Non-Local Means filter applied independently.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130389758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danilo Calhes, F. Kobayashi, Andréa Britto Mattos, M. Macedo, Dário A. B. Oliveira
{"title":"Simplifying Horizon Picking Using Single-Class Semantic Segmentation Networks","authors":"Danilo Calhes, F. Kobayashi, Andréa Britto Mattos, M. Macedo, Dário A. B. Oliveira","doi":"10.1109/sibgrapi54419.2021.00046","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00046","url":null,"abstract":"Seismic image processing plays a significant role in geological exploration as it conditions much of the interpretation performance. The interpretation process comprises several tasks, and Horizon Picking is one of the most time-consuming. Thereat, several works proposed methods for picking horizons automatically, mostly focusing on increasing the accuracy of data-driven approaches, by employing, for instance, semantic segmentation networks. However, these works often rely on a training process that requires several annotated samples, which are known to be scarce in the seismic domain, due to the overwhelming effort associated with manually picking several horizons in a seismic cube. This paper aims to evaluate the simplification of the labeling process required for training, by using training samples composed of disconnected horizons tokens, therefore relaxing the requirement of annotating the full set of horizons from each training sample, as commonly observed in previous works employing semantic segmentation networks. We assessed two state-of-art neural networks for general-purpose domains (PSP-Net and Deeplab V3+) using public seismic data (Netherlands F3 Block dataset). Our results report a minor impact in the performance using our proposed incomplete token training scheme compared to the complete one, moreover, we report that these networks outperform the current state-of-art for horizon picking from small training sets. Thus, our approach proves to be advantageous for the interpreter, given that using partial results instead of providing a full annotation can reduce the user effort during the labeling process required for training the models.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128077690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}