{"title":"Self-Supervised Spiking Neural Networks applied to Digit Classification","authors":"Benjamin Chamand, P. Joly","doi":"10.1145/3549555.3549559","DOIUrl":"https://doi.org/10.1145/3549555.3549559","url":null,"abstract":"The self-supervised learning (SSL) paradigm is a rapidly growing research area in recent years with promising results, especially in the field of image processing. In order for these models to converge towards the creation of discriminative representations, a data augmentation is applied to the input data that feeds two-branch networks. On the other hand, Spiking Neural Networks (SNNs) are attracting a growing community due to their ability to process temporal information, their low-energy consumption and their high biological plausibility. Thanks to the use of Poisson process stochasticity to encode the same data into different temporal representations, and the success of using surrogate gradient on learning, we propose a self-supervised learning method applied to an SNN network, and we make a preliminary study on the generated representations. We have shown its feasibility by training our architecture on a dataset of images of digits (MNIST), then we have evaluated the representations with two classification methods.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127195700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Setyanto, Kusrini Kusrini, G. B. Adninda, Renindya Kartikakirana, Rhisa Aidilla Suprapto, A. Laksito, I. M. D. Agastya, K. Chandramouli, A. Majlingová, Yvonne Brodrechtová, K. Demestichas, E. Izquierdo
{"title":"Ecological Impact Assessment Framework for areas affected by Natural Disasters","authors":"A. Setyanto, Kusrini Kusrini, G. B. Adninda, Renindya Kartikakirana, Rhisa Aidilla Suprapto, A. Laksito, I. M. D. Agastya, K. Chandramouli, A. Majlingová, Yvonne Brodrechtová, K. Demestichas, E. Izquierdo","doi":"10.1145/3549555.3549596","DOIUrl":"https://doi.org/10.1145/3549555.3549596","url":null,"abstract":"The forest's biodiversity consists of relations between trees, animals, the environment, and surrounding communities. Their existence required a certain balance both in number and composition. The diversity of the element itself creates a chain that connects each of the living things. Consistently, those mutual relationships are sometimes disturbed by pressures, whether man-made pressures or natural pressures. As a consequence of that event, the biodiversity loses its balance and becomes vulnerable to disaster. The fact that forest fire cases damage every living thing in the forest is becoming a massive issue in forest management. In some instances, the balance of forest biodiversity assembles an ecological resilience essential to the forest condition in combating disturbance. This paper reviews the biodiversity elements and their relationship to the extent to which elements will support ecological resilience. This is a review of 58 studies related to biodiversity balance and ecological resilience. The review discovered evidence that biodiversity components are connected and support each other. However, not every relation contributes to ecological resilience. As a result, we assess several biodiversity elements that might be useful in supporting ecological resilience, which are tree, environment, animal, and community. We also provide 2 case examples case to get the value of some biodiversity elements using a deep learning method.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129022239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ly-Duyen Tran, Naushad Alam, Yvette Graham, L. K. Vo, N. T. Diep, Binh T. Nguyen, Liting Zhou, C. Gurrin
{"title":"An Exploration into the Benefits of the CLIP model for Lifelog Retrieval","authors":"Ly-Duyen Tran, Naushad Alam, Yvette Graham, L. K. Vo, N. T. Diep, Binh T. Nguyen, Liting Zhou, C. Gurrin","doi":"10.1145/3549555.3549593","DOIUrl":"https://doi.org/10.1145/3549555.3549593","url":null,"abstract":"In this paper, we attempt to fine-tune the CLIP (Contrastive Language-Image Pre-Training) model on the Lifelog Question Answering dataset (LLQA) to investigate retrieval performance of the fine-tuned model over the zero-shot baseline model. We train the model adopting a weight space ensembling approach using a modified loss function to take into account the differences in our dataset (LLQA) when compared with the dataset the CLIP model was originally pretrained on. We further evaluate our fine-tuned model using visual as well as multimodal queries on multiple retrieval tasks, demonstrating improved performance over the zero-shot baseline model.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127017827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Potential of Webcam Based Real Time Eye-Tracking to Reduce Rendering Cost","authors":"Isabel Kütemeyer, M. Lux","doi":"10.1145/3549555.3549595","DOIUrl":"https://doi.org/10.1145/3549555.3549595","url":null,"abstract":"Performance optimisation continues to be a relevant topic both in hardware and software development, with video games producing fully rendered images every 16 or 34 ms, depending on the desired framerate. Human observers close their eyes for about 300 ms an average of twelve times per minute, which means many frames will never be observed. This paper aimed to examine if it would be possible to reduce rendering time by detecting and skipping these unobserved frames. Blinks were identified during runtime by detecting the eye aspect ratio of the observer in low-quality web camera footage. A prototype using this method was tested on a small group of subjects to determine if footage watched this way was perceived as distracting or of lesser quality than unaltered images. Results from a questionnaire suggest that the altered footage did not impact the subjects’ opinions, with no participant reporting any visual disturbances. Because this test used video footage, skipping frames was substituted by a lower resolution render. Altered frames were rendered an average of five percent faster than their unaltered counterparts.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130571964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Few-shot Object Detection as a Semi-supervised Learning Problem","authors":"W. Bailer, Hannes Fassold","doi":"10.1145/3549555.3549599","DOIUrl":"https://doi.org/10.1145/3549555.3549599","url":null,"abstract":"This paper addresses the issue of dealing with few-shot learning settings in which different classes are annotated on different datasets. Each part of the data has exhaustive annotations for only one or a small set of classes, but not for others used in training. It is likely, that unannotated samples of a class exist, potentially impacting the gradient as negative samples. Because of this fact, we argue that few-shot learning is essentially a semi-supervised learning problem. We analyze how approaches from semi-supervised learning can be applied. In particular, the use of soft-sampling to weight the gradient based on overlap of detections and ground truth, and creating missing annotations using a preliminary detector are studied. The use of soft-sampling provides small but consistent improvements, at much lower computational effort than predicting additional annotations.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134230905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos Chatzistavros, Theodora Pistola, S. Diplaris, K. Ioannidis, S. Vrochidis, Y. Kompatsiaris
{"title":"Sentiment analysis on 2D images of urban and indoor spaces using deep learning architectures","authors":"Konstantinos Chatzistavros, Theodora Pistola, S. Diplaris, K. Ioannidis, S. Vrochidis, Y. Kompatsiaris","doi":"10.1145/3549555.3549575","DOIUrl":"https://doi.org/10.1145/3549555.3549575","url":null,"abstract":"This paper focuses on the determination of the evoked sentiments to people by observing outdoor and indoor spaces, aiming to create a tool for designers and architects that can be utilized for sophisticated designs. Since sentiment is subjective, the design process can be facilitated by an ancillary automated tool for sentiment extraction. Simultaneously, a dataset containing both real and virtual images of vacant architectural spaces is introduced, while the SUN attributes are also extracted from the images in order to be included throughout training. The dataset is annotated towards both valence and arousal, while five established and two custom architectures, one which has never been used before in classifying abstract concepts, are evaluated on the collected data.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133602949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Human Performance on Sketch-Based Image Retrieval","authors":"Omar Seddati, S. Dupont, S. Mahmoudi, T. Dutoit","doi":"10.1145/3549555.3549582","DOIUrl":"https://doi.org/10.1145/3549555.3549582","url":null,"abstract":"Sketch-based image retrieval (SBIR) solutions are attracting increased interest in the field of computer vision. These solutions provide an intuitive and powerful tool to retrieve images in large-scale image databases. In this paper, we conduct a comprehensive study of classic triplet CNN training pipelines within the SBIR context. We study the impact of embeddings normalization, model sharing, margin selection, batch size, hard mining selection and the evolution of the number of hard triplets during training to propose several avenues for improvement. We also propose dropout column, an adaptation of dropout for triplet network and similar pipelines. In addition, we also introduce a novel approach to build state-of-the-art SBIR solutions that can be used with low power systems. The whole study is conducted using The Sketchy Database, a large-scale SBIR database. We carry out a series of experiments and show that adopting a few simple modifications enhances significantly existing SBIR pipelines (faster training & higher accuracy). Our study enables us to propose an enhanced pipeline that outperforms previous state-of-the-art on the Sketchy Database by a significant margin (a recall of 53.92% compared to 46.2% at k = 1) and reaches almost human performance (54.27%) on a large-scale benchmark.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128356600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chest Diseases Classification Using CXR and Deep Ensemble Learning","authors":"Adnane Ait Nasser, M. Akhloufi","doi":"10.1145/3549555.3549581","DOIUrl":"https://doi.org/10.1145/3549555.3549581","url":null,"abstract":"Chest diseases are among the most common worldwide health problems; they are potentially life-threatening disorders which can affect organs such as lungs and heart. Radiologists typically use visual inspection to diagnose chest X-ray (CXR) diseases, which is a difficult task prone to errors. The signs of chest abnormalities appear as opacities around the affected organ, making it difficult to distinguish between diseases of superimposed organs. To this end, we propose a very first method for CXR organ disease detection using deep learning. We used an ensemble learning (EL) approach to increase the efficiency of the classification of CXR diseases by organs (lung and heart) using a consolidated dataset. This dataset contains 26,316 CXR images from VinDr-CXR and CheXpert datasets. The proposed ensemble of deep convolutional neural networks (DCNN) approach achieves excellent performance with an AUC of 0.9489 for multi-class classification, outperforming many state-of-the-art models.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128371374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A large-scale TV video and metadata database for French political content analysis and fact-checking","authors":"Frédéric Rayar, Mathieu Delalandre, Van-Hao Le","doi":"10.1145/3549555.3549557","DOIUrl":"https://doi.org/10.1145/3549555.3549557","url":null,"abstract":"In this paper, we introduce a large-scale multimodal publicly available dataset1 for the French political content analysis and fact-checking. This dataset consists of more than 1,200 fact-checked claims that have been scraped from a fact-checking service with associated metadata. For the video counterpart, the dataset contains nearly 6,730 TV programs, having a total duration of 6,540 hours, with metadata. These programs have been collected during the 2022 French presidential election with a dedicated workstation.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116676490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"StyleGAN-based CLIP-guided Image Shape Manipulation","authors":"Yuchen Qian, Kohei Yamamoto, Keiji Yanai","doi":"10.1145/3549555.3549556","DOIUrl":"https://doi.org/10.1145/3549555.3549556","url":null,"abstract":"In this paper, we propose a text-guided image manipulation method which focuses on editing shape attribute using text description. We combine an image generation model, StyleGAN2, and image-text matching model, CLIP, and we have achieved the goal of image shape attribute manipulation by modifying the parameters of the pretrained StyleGAN2 generator. Qualitative and quantitative evaluations are conducted to demonstrate the effectiveness of the proposed method.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121983044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}