{"title":"BRCars: a Dataset for Fine-Grained Classification of Car Images","authors":"Daniel M. Kuhn, V. Moreira","doi":"10.1109/sibgrapi54419.2021.00039","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00039","url":null,"abstract":"Fine-grained computer vision tasks refer to the ability of distinguishing objects that belong to the same parent class, differentiating themselves by subtle visual elements. Image classification in car models is considered a fine-grained classification task. In this work, we introduce BRCars, a dataset that seeks to replicate the main challenges inherent to the task of classifying car images in many practical applications. BRCars contains around 300K images collected from a Brazilian car advertising website. The images correspond to 52K car instances and are distributed among 427 different models. The images are both from the exterior and the interior of the cars and present an unbalanced distribution across the different models. In addition, they are characterized by a lack of standardization in terms of perspective. We adopted a semi-automated annotation pipeline with the help of the new CLIP neural network, which enabled distinguishing thousands of images among different perspectives using textual queries. Experiments with standard deep learning classifiers were performed to serve as baseline results for future work on this topic. BRCars dataset is available at https://github.com/danimtk/brcars-dataset.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125565860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matheus C. de Oliveira, L. Martins, H. C. Jung, Nilson Donizete Guerin Júnior, R. C. D. Silva, Eduardo Peixoto, B. Macchiavello, E. Hung, Vanessa Testoni, P. Freitas
{"title":"Learning-based End-to-End Video Compression Using Predictive Coding","authors":"Matheus C. de Oliveira, L. Martins, H. C. Jung, Nilson Donizete Guerin Júnior, R. C. D. Silva, Eduardo Peixoto, B. Macchiavello, E. Hung, Vanessa Testoni, P. Freitas","doi":"10.1109/sibgrapi54419.2021.00030","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00030","url":null,"abstract":"Driven by the growing demand for video applications, deep learning techniques have become alternatives for implementing end-to-end encoders to achieve applicable compression rates. Conventional video codecs exploit both spatial and temporal correlation. However, due to some restrictions (e.g. computational complexity), they are commonly limited to linear transformations and translational motion estimation. Autoencoder models open up the way for exploiting predictive end-to-end video codecs without such limitations. This paper presents an entire learning-based video codec that exploits spatial and temporal correlations. The presented codec extends the idea of P-frame prediction presented in our previous work. The architecture adopted for I-frame coding is defined by a variational autoencoder with non-parametric entropy modeling. Besides an entropy model parameterized by a hyperprior, the inter-frame encoder architecture has two other independent networks, responsible for motion estimation and residue prediction. Experimental results indicate that some improvements still have to be incorporated into our codec to overcome the all-intra coding set up regarding the traditional algorithms High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC).","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115978409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entropic Laplacian eigenmaps for unsupervised metric learning","authors":"A. Levada, M. Haddad","doi":"10.1109/sibgrapi54419.2021.00049","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00049","url":null,"abstract":"Unsupervised metric learning is concerned with building adaptive distance functions prior to pattern classification. Laplacian eigenmaps consists of a manifold learning algorithm which uses dimensionality reduction to find more compact and meaningful representations of datasets through the Laplacian matrix of graphs. In the present paper, we propose the entropic Laplacian eigenmaps (ELAP) algorithm, a parametric approach that employs the Kullback–Leibler (KL) divergence between patches of the KNN graph instead of the pointwise Euclidean metric as the cost function for the graph weights. Our objective with such a modification is increasing the robustness of Laplacian eigenmaps against noise and outliers. Our results using various real-world datasets indicate that the proposed method is capable of generating more reasonable clusters while reporting greater classification accuracies compared to existing widely adopted methods for dimensionality reduction-based metric learning.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126823372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. V. Sousa, Leandro A. F. Fernandes, C. Vasconcelos
{"title":"ConformalLayers: A non-linear sequential neural network with associative layers","authors":"E. V. Sousa, Leandro A. F. Fernandes, C. Vasconcelos","doi":"10.1109/SIBGRAPI54419.2021.00059","DOIUrl":"https://doi.org/10.1109/SIBGRAPI54419.2021.00059","url":null,"abstract":"Convolutional Neural Networks (CNNs) have been widely applied. But as the CNNs grow, the number of arithmetic operations and memory footprint also increase. Furthermore, typical non-linear activation functions do not allow associativity of the operations encoded by consecutive layers, preventing the simplification of intermediate steps by combining them. We present a new activation function that allows associativity between sequential layers of CNNs. Even though our activation function is non-linear, it can be represented by a sequence of linear operations in the conformal model for Euclidean geometry. In this domain, operations like, but not limited to, convolution, average pooling, and dropout remain linear. We take advantage of associativity to combine all the “conformal layers” and make the cost of inference constant regardless of the depth of the network.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133089689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Jodas, S. Brazolin, T. Yojo, Reinaldo Araujo de Lima, G. Velasco, A. Machado, J. Papa
{"title":"A Deep Learning-based Approach for Tree Trunk Segmentation","authors":"D. Jodas, S. Brazolin, T. Yojo, Reinaldo Araujo de Lima, G. Velasco, A. Machado, J. Papa","doi":"10.1109/sibgrapi54419.2021.00057","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00057","url":null,"abstract":"Recently, the real-time monitoring of the urban ecosystem has raised the attention of many municipal forestry management services. The proper maintenance of trees is seen as crucial to guarantee the quality and safety of the streetscape. However, the current analysis still involves the time-consuming fieldwork conducted for extracting the measurements of each part of the tree, including the angle and diameter of the trunk, to cite a few. Therefore, real-time monitoring is thoroughly necessary for the rapid identification of the constituent parts of the trees in images of the urban environment and the automatic estimation of their physical measures. This paper presents a method to segment the tree trunks in photographs of the municipal regions. To accomplish such a task, we introduce a semantic segmentation convolutional neural network architecture that incorporates a depthwise residual block to the well-known U-Net model to reduce the parameters required to create the network. Then, we perform a post-processing step to refine the segmented regions by removing the additional binary areas not related to the tree trunk. Lastly, the proposed method also extracts the central line of the identified region for future computation of the trunk measurements. Compared with the original U-Net architecture, the obtained results confirm the robustness of the proposed approaches, including similar evaluation metrics and the significant reduction of the network size.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131611514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rogerio Ferreira De Moraes, Raphael S. Evangelista, Leandro A. F. Fernandes, Luis Martí
{"title":"GCOOD: A Generic Coupled Out-of-Distribution Detector for Robust Classification","authors":"Rogerio Ferreira De Moraes, Raphael S. Evangelista, Leandro A. F. Fernandes, Luis Martí","doi":"10.1109/sibgrapi54419.2021.00062","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00062","url":null,"abstract":"Neural networks have achieved high degrees of accuracy in classification tasks. However, when an out-of-distribution (OOD) sample (i.e., entries from unknown classes) is submitted to the classification process, the result is the association of the sample to one or more of the trained classes with different degrees of confidence. If any of these confidence values are more significant than the user-defined threshold, the network will mislabel the sample, affecting the model credibility. The definition of the acceptance threshold itself is a sensitive issue in the face of the classifier’s overconfidence. This paper presents the Generic Coupled OOD Detector (GCOOD), a novel Convolutional Neural Network (CNN) tailored to detect whether an entry submitted to a trained classification model is an OOD sample for that model. From the analysis of the Softmax output of any classifier, our approach can indicate whether the resulting classification should be considered or not as a sample of some of the trained classes. To train our CNN, we had to develop a novel training strategy based on Voronoi diagrams of the location of representative entries in the latent space of the classification model and graph coloring. We evaluated our approach using ResNet, VGG, DenseNet, and SqueezeNet classifiers with images from the CIFAR-10 dataset.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123909002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanna F. Menezes, Arthur S. C. Ferreira, E. Pereira, H. M. Gomes
{"title":"Bias and Fairness in Face Detection","authors":"Hanna F. Menezes, Arthur S. C. Ferreira, E. Pereira, H. M. Gomes","doi":"10.1109/sibgrapi54419.2021.00041","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00041","url":null,"abstract":"Processing of face images is used in many areas, for example: commercial applications such as video-games; facial biometrics; facial expression recognition, etc. Face detection is a crucial step for any system that processes face images. Therefore, if there is bias or unfairness in this first step, all the processing steps that follow may be compromised. Errors in automatic face detection may be harmful to people as, for instance, in situations where a decision may limit or restrict their freedom to come and go. Therefore, it is crucial to investigate the existence of these errors caused due to bias or unfairness. In this paper, an analysis of five well-known top accuracy face detectors is performed to investigate the presence of bias and unfairness in their results. Some of the metrics used to identify the existence of bias and unfairness involved the verification of demographic parity, verification of existence of false positives and/or false negatives, rate of positive prediction, and verification of equalized odds. Data from about 365 different individuals were randomly selected from the Facebook Casual Conversations Dataset, resulting in approximately 5,500 videos, providing 550,000 frames used for face detection in the performed experiments. The obtained results show that all five face detectors presented a high risk of not detecting faces from the female gender and from people between 46 and 85 years old. Furthermore, the skin tone groups related with dark skin are the groups pointed out with highest risk of faces not being detected for four of the five evaluated face detectors. This paper points out the necessity of the research community to engage in breaking the perpetuation of injustice that may be present in datasets or machine learning models.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124156747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing the Effects of Dimensionality Reduction for Unsupervised Domain Adaptation","authors":"Renato Sergio Lopes Junior, W. R. Schwartz","doi":"10.1109/sibgrapi54419.2021.00019","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00019","url":null,"abstract":"Deep neural networks are extensively used for solving a variety of computer vision problems. However, in order for these networks to obtain good results, a large amount of data is necessary for training. In image classification, this training data consists of images and labels that indicate the class portrayed by each image. Obtaining this large labeled dataset is very time and resource consuming. Therefore, domain adaptation methods allow different, but semantic-related, datasets that are already labeled to be used during training, thus eliminating the labeling cost. In this work, the effects of embedding dimensionality reduction in a state-of-the-art domain adaptation method are analyzed. Furthermore, we experiment with a different approach that use the available data from all domains to compute the confidence of pseudo-labeled samples. We show through experiments in commonly used datasets that, in fact, the proposed modifications led to better results in the target domain in some scenarios.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121219154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Schirmer, Djalma Lúcio, Leandro Cruz, Alberto Barbosa Raposo, L. Velho, H. Lopes
{"title":"SGAT: Semantic Graph Attention for 3D human pose estimation","authors":"L. Schirmer, Djalma Lúcio, Leandro Cruz, Alberto Barbosa Raposo, L. Velho, H. Lopes","doi":"10.1109/sibgrapi54419.2021.00042","DOIUrl":"https://doi.org/10.1109/sibgrapi54419.2021.00042","url":null,"abstract":"We propose a novel gating mechanism applied to Semantic Graph Convolutions for 3D applications, named Semantic Graph Attention. Semantic Graph Convolutions learn to capture semantic information such as local and global node relationships, not explicitly represented in graphs. We improve their performance by proposing an attention block to explore channel-wise inter-dependencies. The proposed method performs the unprojection of the 2D points (image) onto their 3D version. We use it to estimate 3d human pose from 2D images. Both 2D and 3D human poses can be represented as structured graphs, exploring their particularities in this context. The attention layer improves the accuracy of skeleton estimation using 58% fewer parameters than state-of-the-art.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123312649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luiza C. de Menezes, Augusto R. V. F. de Araújo, A. Conci
{"title":"An approach based on image processing techniques to segment lung region in chest X-ray images","authors":"Luiza C. de Menezes, Augusto R. V. F. de Araújo, A. Conci","doi":"10.1109/SIBGRAPI54419.2021.00024","DOIUrl":"https://doi.org/10.1109/SIBGRAPI54419.2021.00024","url":null,"abstract":"Chest X-ray (CXR) images help specialists worldwide to diagnose lung diseases, such as tuberculosis and COVID-19. A primary step in an image-based diagnostic tool is to segment the region of interest. That facilitates the disease classification problem by reducing the amount of information to be processed. However, due to the noisy nature of CXRs, identifying the lung region can be a challenging task. This paper addresses the lung segmentation problem using a less costable computational process based on image analysis and mathematical morphology techniques. The proposed method achieved a specificity of 92.92%, a Jaccard index of 77.77%, and a Dice index of 87.37% on average. All images that comprehend the dataset used and their respective ground truths are available for download at https://github.com/mnzluiza/Lung-Segmentation.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122929243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}