{"title":"A domain adaptive deep learning solution for scanpath prediction of paintings","authors":"M. A. Kerkouri, M. Tliba, A. Chetouani, A. Bruno","doi":"10.1145/3549555.3549597","DOIUrl":"https://doi.org/10.1145/3549555.3549597","url":null,"abstract":"Cultural heritage understanding and preservation is an important issue for society as it represents a fundamental aspect of its identity. Paintings represent a significant part of cultural heritage, and are the subject of study continuously. However, the way viewers perceive paintings is strictly related to the so-called HVS (Human Vision System) behaviour. This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. In further details, we introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans, including the fundamental understanding of a scene, and then extend it to painting images. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers’ attention. We use an FCNN (Fully Convolutional Neural Network), in which we exploit a differentiable channel-wise selection and Soft-Argmax modules. We also incorporate learnable Gaussian distributions onto the network bottleneck to simulate visual attention process bias in natural scene images. Furthermore, to reduce the effect of shifts between different domains (i.e. natural images, painting), we urge the model to learn unsupervised general features from other domains using a gradient reversal classifier. The results obtained by our model outperform existing state-of-the-art ones in terms of accuracy and efficiency.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129397431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abel Kahsay Gebreslassie, J. Benois-Pineau, A. Zemmari
{"title":"Streaming learning with Move-to-Data approach for image classification","authors":"Abel Kahsay Gebreslassie, J. Benois-Pineau, A. Zemmari","doi":"10.1145/3549555.3549590","DOIUrl":"https://doi.org/10.1145/3549555.3549590","url":null,"abstract":"In Deep Neural Network training, the availability of a large amount of representative training data is the sine qua non-condition for a good generalization capacity of the model. In many real-world applications, data is not available at a glance, but coming on the fly. If a pre-trained model is fine-tuned on the new data, then catastrophic forgetting happens mostly. Incremental learning mechanisms propose ways to overcome catastrophic forgetting. Streaming learning is a type of incremental learning where models learn from new data instances as soon as they become available in a single training pass. In this work, we conduct an experimental study, on a large dataset, of an incremental/streaming learning method Move-to-Data we previously proposed, and propose an updated approach by ”re-targeting” with gradient descent which is faster than the popular streaming learning method ExStream. The method achieves better performances and computational efficiency compared to ExStream. Move-to-Data with gradient is on average 3.5 times faster than ExStream and has a similar accuracy, with 0.5% improvement compared to ExStream.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124401797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Pegia, A. Moumtzidou, Ilias Gialampoukidis, Björn þór Jónsson, S. Vrochidis, Y. Kompatsiaris
{"title":"BiasUNet: Learning Change Detection over Sentinel-2 Image Pairs","authors":"Maria Pegia, A. Moumtzidou, Ilias Gialampoukidis, Björn þór Jónsson, S. Vrochidis, Y. Kompatsiaris","doi":"10.1145/3549555.3549574","DOIUrl":"https://doi.org/10.1145/3549555.3549574","url":null,"abstract":"The availability of satellite images has increased due to the fast development of remote sensing technology. As a result several deep learning change detection methods have been developed to capture spatial changes from multi temporal satellite images that are of great importance in remote sensing, monitoring environmental changes and land use. Recently, a supervised deep learning network called FresUNet has been proposed, which performs a pixel-level change detection from image pairs. In this paper, we extend this method by inserting a Bayesian framework that uses Monte Carlo Dropout, motivated by a recent work in image segmentation. The proposed Bayesian FresUNet (BiasUNet) approach is shown to outperform four state-of-the-art deep learning networks on Sentinel-2 ONERA Satellite Change Detection (OSCD) benchmark dataset, both in terms of precision and quality.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133211608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Koulalis, Nikolaos I. Dourvas, Theocharis Triantafyllidis, K. Ioannidis, S. Vrochidis, Y. Kompatsiaris
{"title":"A survey for image based methods in construction: from images to digital twins","authors":"I. Koulalis, Nikolaos I. Dourvas, Theocharis Triantafyllidis, K. Ioannidis, S. Vrochidis, Y. Kompatsiaris","doi":"10.1145/3549555.3549594","DOIUrl":"https://doi.org/10.1145/3549555.3549594","url":null,"abstract":"In the construction domain, Digital twins are mostly used for facilities management of buildings, but their applications are still very limited. The virtualization of buildings and bridges in the last 15 years in the form of Building or Bridge Information Models is clearly identified as the starting point for the DTs. The industry has erected a frame with semantically rich 3D reference models that are now heavily enriched with visual sensor data captured on construction sites. This article provides an overview of the research and current practices of computer vision methods in the construction industry and presents typical examples of their applications for 3D reconstruction, safety management and structural monitoring for quality assurance. It then highlights the dominant achievements presented in the literature and concludes with the challenges and research directions applicable to digital twins that need to be addressed and exploited in the future.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114220765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Urban Image Geo-Localization Using Open Data on Public Spaces","authors":"Mathias Glistrup, S. Rudinac, Björn þór Jónsson","doi":"10.1145/3549555.3549589","DOIUrl":"https://doi.org/10.1145/3549555.3549589","url":null,"abstract":"In this paper, we study the problem of urban image geo-localization, where the aim is to estimate the real-world location in which an image was taken. Among the previous approaches to this task, we note three distinct categories: one only analyzes metadata; the other only analyzes the image content; and the third combines the two. However, most previous approaches require large annotated collections of images or their metadata. Instead of relying on large collections of images, we propose to use publicly available geographical (GIS) data, which contains information about urban objects in public spaces, as a backbone database to query images against. We argue that images can be effectively represented by the objects they contain, and that the spatial geometry of a scene—i.e., the positioning of these objects relative to each other—can function as a unique identifier for a particular physical location. Our experiments demonstrate the potential of using open GIS data for precise image geolocation estimation and serve as a baseline for future research in multimedia geo-localization.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128090222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Nearest Neighbor Indexing by Multitask Learning","authors":"Amorntip Prayoonwong, Ke Zeng, Chih-Yi Chiu","doi":"10.1145/3549555.3549579","DOIUrl":"https://doi.org/10.1145/3549555.3549579","url":null,"abstract":"In the task of approximate nearest neighbor search, the conventional lookup-table indexing calculates the distances (or similarities) between the query and codewords, and then re-ranks the data points associated with the nearest (or the most similar) codewords. To address the codeword quantization loss problem exhibited in the conventional method, the probability-based indexing leverages the data distribution among codewords learned by neural networks to locate the nearest neighbor [8]. In this paper, we present a multitasking model to improve the probability-based indexing method. The model is formulated by two objectives of NN distribution probabilities and data retrieval quantity. The NN distribution probabilities are an estimation to determine the possible codewords where the nearest neighbor may be associated. The candidate retrieval quantity specifies the prediction for the least number of codewords to be re-ranked for capturing the nearest neighbor. The proposed model is then trained by minimizing triplet loss, probability loss, and quantity loss. By learning these tasks in parallel, we find the predictions for both data distribution probability and data retrieval quantity are more accurate, so that search accuracy and computation efficiency can be improved together. We experiment on two billion-scale benchmark datasets to evaluate the proposed method and compare with several approximate nearest neighbor search methods, and the results demonstrate the outperformance of the proposed method.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121933515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relational Database Performance for Multimedia: A Case Study","authors":"Björn þór Jónsson, Aaron Duane, Nikolaj Mertz","doi":"10.1145/3549555.3549558","DOIUrl":"https://doi.org/10.1145/3549555.3549558","url":null,"abstract":"This paper describes the performance optimisation of a state-of-the-art relational database to more efficiently serve data for multimedia visualisations in the ViRMA prototype. We describe the baseline database and queries, along with two major optimisation steps that improve query efficiency, at the cost of slowing down dynamic updates. We evaluate the optimisations with a case study of a lifelog collection of 182K images, showing that the time to produce complex visualisations is reduced by orders of magnitude.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"85 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120924469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time deblurring network for face AR applications","authors":"Juhwan Lee, Jonghan Lee, S. Yoo","doi":"10.1145/3549555.3549577","DOIUrl":"https://doi.org/10.1145/3549555.3549577","url":null,"abstract":"Deblurring is a problem that has been studied for a long time. Extant works have primarily focused on deblurring real-world images. However, face images are different from real-world images. Because face images have fewer textures and weaker edges than real-world images, the deblurring of real-world images focuses on restoring the overall texture of the image; however, restoring the particular face structure (e.g., eyes, nose, and ears) is essential for face images. Recently, a convolutional neural network(CNN)-based deblurring network has been proposed. There are various types of CNN-based deblurring networks. Recently, multiscale architecture has been widely used; however, these types of networks need large amounts of resources. Further, because of the multitude of parameters, it requires a significant amount of time for inference. In this study, we developed a end-to-end network for face image deblurring, wherein novel CNN-based feature attention (FA) blocks are adopted, and a low inference time is achieved. Moreover, discrete Fourier transform (DFT) is employed for high-quality deblurring. FA blocks combine channel attention layer and pixel attention layer for feature extraction. The spectrum obtained using DFT is used as a loss function by comparing the ground truth image with the deblurring image. Experimental results show that the ours network is comparable to other deblurring networks in terms of performance as indicated by the PSNR, SSIM. Moreover we also demonstrated performance improvement by measuring the mean Intersection over Union (mIoU) of the deblurred image using a face-segmentation network.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132810197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiang Zhou, Kevin McGuinness, Joseph Antony, Noel E O 'connor
{"title":"A Fine Grained Quality Assessment of Video Anomaly Detection","authors":"Jiang Zhou, Kevin McGuinness, Joseph Antony, Noel E O 'connor","doi":"10.1145/3549555.3549569","DOIUrl":"https://doi.org/10.1145/3549555.3549569","url":null,"abstract":"In this paper we propose a new approach to assess the performance of video anomaly detection algorithms. Inspired by the COCO metrics we propose a quartile based quality assessment of video anomaly detection to have a detailed breakdown of algorithm performance. The proposed assessment divides the detection into five categories based on the measurement quartiles of the position, scale and motion magnitude of anomalies. A weighted precision is introduced in the average precision calculation such that the frame-level average precision reported in categories can be compared to each other regardless of the baseline of the precision-recall curve in every category. We evaluated three video anomaly detection approaches, including supervised and unsupervised approaches, on five public datasets using the proposed approach. Our evaluation shows that the anomaly scale introduces performance difference in detection. For both supervised and unsupervised methods evaluated, the detection achieve higher average precision for the large anomalies in scale. Our assessment also shows that the supervised multiple instance learning method is robust to the motion magnitude differences in anomalies, while the unsupervised one-class neural network method performs better than the unsupervised autoencoder reconstruction method when the motion magnitudes are small. Our experiments, however, also show that the positions of the anomalies have impact on the performance of the multiple instance learning method and the one-class neural network method but the impact on the autoencoder-based approach is negligible.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130688181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmenting partially annotated medical images","authors":"Nicolas Martin, J. Chevallet, G. Quénot","doi":"10.1145/3549555.3549570","DOIUrl":"https://doi.org/10.1145/3549555.3549570","url":null,"abstract":"Segmentation of medical images using learning based systems remains a challenge in medical computer vision: training a segmentation model requires medical images exhaustively annotated by experts that are difficult and expensive to obtain. We propose to explore the usage of partially annotated images, i.e., all images are annotated but not all regions of a given class are annotated. In this paper, we propose several approaches and we experiment them on the segmentation of intra-oral images. First, we propose to modify the loss function to consider only the annotated areas, and second to integrate annotation from non-expert, as well as the combination of these methods. The experiments we conducted showed an improvement up to 33% on the segmentation performance. This approach allows to obtain better quality annotation masks than the initial human annotation using only partially annotated areas or non-expert annotations. In the future, these approaches can be extended by combination with active learning methods.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123820468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}