María José Castro Bleda, J. M. Vilar, D. Llorens, A. Marzal, F. Prat, Francisco Zamora-Martínez
{"title":"A system for assisted transcription and annotation of ancient documents","authors":"María José Castro Bleda, J. M. Vilar, D. Llorens, A. Marzal, F. Prat, Francisco Zamora-Martínez","doi":"10.1145/3095713.3095752","DOIUrl":"https://doi.org/10.1145/3095713.3095752","url":null,"abstract":"Computer assisted transcription tools can speed up the process of reading and transcribing texts. At the same time, new annotation tools open new ways of accessing the text in its graphical form. STATE, an assisted transcription system for ancient documents, offers a multimodal interaction environment to assist humans in transcribing documents: the user can type, write on the screen or utter a word. When one of these actions is used to correct an erroneous word, the system uses this new information to look for other mistakes. The system is modular: creation of projects from a set of images of documents, an automatic transcription system, and user interaction with the transcriptions to easily correct them as needed. This division of labor allows great flexibility for organizing the work in a team of transcribers. Our immediate goals are to improve the recognition system and to enrich the obtained transcriptions with scholarly descriptions.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122341074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Cartoon Colorization Based on Convolutional Neural Network","authors":"D. Varga, C. Szabó, T. Szirányi","doi":"10.1145/3095713.3095742","DOIUrl":"https://doi.org/10.1145/3095713.3095742","url":null,"abstract":"This paper deals with automatic cartoon colorization. This is a hard issue, since it is an ill-posed problem that usually requires user intervention to achieve high quality. Motivated by the recent successes in natural image colorization based on deep learning techniques, we investigate the colorization problem at the cartoon domain using Convolutional Neural Network. To our best knowledge, no existing papers or research studies address this problem using deep learning techniques. Here we investigate a deep Convolutional Neural Network based automatic color filling method for cartoons.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125334383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Krismayer, M. Schedl, Peter Knees, Rick Rabiser
{"title":"Prediction of User Demographics from Music Listening Habits","authors":"Thomas Krismayer, M. Schedl, Peter Knees, Rick Rabiser","doi":"10.1145/3095713.3095722","DOIUrl":"https://doi.org/10.1145/3095713.3095722","url":null,"abstract":"Online activities such as social networking, shopping, and consuming multi-media create digital traces often used to improve user experience and increase revenue, e.g., through better-fitting recommendations and targeted marketing. We investigate to which extent the music listening habits of users of the social music platform Last.fm can be used to predict their age, gender, and nationality. We propose a TF-IDF-like feature modeling approach for artist listening information and artist tags combined with additionally extracted features. We show that we can substantially outperform a baseline majority voting approach and can compete with existing approaches. Further, regarding prediction accuracy vs. available listening data we show that even one single listening event per user is enough to outperform the baseline in all prediction tasks. We conclude that personal information can be derived from music listening information, which indeed can help better tailoring recommendations.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116138954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Hierarchical Image Classification with Merged CNN Architectures","authors":"Anuvabh Dutt, D. Pellerin, G. Quénot","doi":"10.1145/3095713.3095745","DOIUrl":"https://doi.org/10.1145/3095713.3095745","url":null,"abstract":"We consider the problem of image classification using deep convolutional networks, with respect to hierarchical relationships among classes. We investigate if the semantic hierarchy is captured by CNN models or not. For this we analyze the confidence of the model for a category and its sub-categories. Based on the results, we propose an algorithm for improving the model performance at test time by adapting the classifier to each test sample and without any re-training. Secondly, we propose a strategy for merging models for jointly learning two levels of hierarchy. This reduces the total training time as compared to training models separately, and also gives improved classification performance.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126809995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. A. Vassou, N. Anagnostopoulos, A. Amanatiadis, Klitos Christodoulou, S. Chatzichristofis
{"title":"CoMo: A Compact Composite Moment-Based Descriptor for Image Retrieval","authors":"S. A. Vassou, N. Anagnostopoulos, A. Amanatiadis, Klitos Christodoulou, S. Chatzichristofis","doi":"10.1145/3095713.3095744","DOIUrl":"https://doi.org/10.1145/3095713.3095744","url":null,"abstract":"Low level features play a vital role in image retrieval. Image moments can effectively represent global information of image content while being invariant under translation, rotation, and scaling. This paper briefly presents a moment based composite and compact low-level descriptor for image retrieval. In order to test the proposed feature, the authors employ the Bag-of-Visual-Words representation to perform experiments on two well-known benchmarking image databases. The robust and highly competitive retrieval performances, reported in all tested diverse collections, verify the promising potential that the proposed descriptor introduces.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121308870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NeuralStory: an Interactive Multimedia System for Video Indexing and Re-use","authors":"L. Baraldi, C. Grana, R. Cucchiara","doi":"10.1145/3095713.3095735","DOIUrl":"https://doi.org/10.1145/3095713.3095735","url":null,"abstract":"In the last years video has been swamping the Internet: websites, social networks, and business multimedia systems are adopting video as the most important form of communication and information. Video are normally accessed as a whole and are not indexed in the visual content. Thus, they are often uploaded as short, manually cut clips with user-provided annotations, keywords and tags for retrieval. In this paper, we propose a prototype multimedia system which addresses these two limitations: it overcomes the need of human intervention in the video setting, thanks to fully deep learning-based solutions, and decomposes the storytelling structure of the video into coherent parts. These parts can be shots, key-frames, scenes and semantically related stories, and are exploited to provide an automatic annotation of the visual content, so that parts of video can be easily retrieved. This also allows a principled re-use of the video itself: users of the platform can indeed produce new storytelling by means of multi-modal presentations, add text and other media, and propose a different visual organization of the content. We present the overall solution, and some experiments on the re-use capability of our platform in edutainment by conducting an extensive user valuation","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122163347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Ferracani, Daniele Pezzatini, Lea Landucci, Giuseppe Becchi, A. Bimbo
{"title":"Separating the Wheat from the Chaff: Events Detection in Twitter Data","authors":"Andrea Ferracani, Daniele Pezzatini, Lea Landucci, Giuseppe Becchi, A. Bimbo","doi":"10.1145/3095713.3095728","DOIUrl":"https://doi.org/10.1145/3095713.3095728","url":null,"abstract":"In this paper we present a system for the detection and validation of macro and micro-events in cities (e.g. concerts, business meetings, car accidents) through the analysis of geolocalized messages from Twitter. A simple but effective method is proposed for unknown event detection designed to alleviate computational issues in traditional approaches. The method is exploited by a web interface that in addition to visualizing the results of the automatic computation exposes interactive tools to inspect, validate the data and refine the processing pipeline. Researchers can exploit the web application for the rapid creation of macro and micro-events datasets of geolocalized messages currently unavailable and needed to improve supervised and unsupervised events classification on Twitter. The system has been evaluated in terms of precision.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131015453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massimo Mauro, Sergio Benini, N. Adami, A. Signoroni, R. Leonardi, Luca Canini
{"title":"A free Web API for single and multi-document summarization","authors":"Massimo Mauro, Sergio Benini, N. Adami, A. Signoroni, R. Leonardi, Luca Canini","doi":"10.1145/3095713.3095738","DOIUrl":"https://doi.org/10.1145/3095713.3095738","url":null,"abstract":"In this work we present a free Web API for single and multi-text summarization. The summarization algorithm follows an extractive approach, thus selecting the most relevant sentences from a single document or a document set. It integrates in a novel pipeline different text analysis techniques - ranging from keyword and entity extraction, to topic modelling and sentence clustering - and gives SoA competitive results. The application, written in Python, supports as input both plain texts and Web URLs. The API is publicly accessible for free using the specific conference token1 as described in the reference page2. The browser-based demo version, for summarization of single documents only, is publicly accessible at http://yonderlabs.com/demo.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115066742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bangladeshi Number Plate Detection: Cascade Learning vs. Deep Learning","authors":"M. Pias, Aunnoy K. Mutasim, M. Amin","doi":"10.1145/3095713.3095727","DOIUrl":"https://doi.org/10.1145/3095713.3095727","url":null,"abstract":"This work investigated two different machine learning techniques: Cascade Learning and Deep Learning, to find out which algorithm performs better to detect the number plate of vehicles registered in Bangladesh. To do this, we created a dataset of about 1000 images collected from a security camera of Independent University, Bangladesh. Each image in the dataset were then labelled manually by selecting the Region of Interest (ROI). In the Cascade Learning approach, a sliding window technique was used to detect objects. Then a cascade classifier was employed to determine if the window contained object of interest or not. In the Deep Learning approach, CIFAR-10 dataset was used to pre-train a 15-layer Convolutional Neural Network (CNN). Using this pretrained CNN, a Regions with CNN (R-CNN) was then trained using our dataset. We found that the Deep Learning approach (maximum accuracy 99.60% using 566 training images) outperforms the detector constructed using Cascade classifiers (maximum accuracy 59.52% using 566 positive and 1022 negative training images) for 252 test images.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127337448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualizing weakly-Annotated Multi-label Mayan Inscriptions with Supervised t-SNE","authors":"E. Román-Rangel, S. Marchand-Maillet","doi":"10.1145/3095713.3095720","DOIUrl":"https://doi.org/10.1145/3095713.3095720","url":null,"abstract":"We present a supervised dimensionality reduction technique suitable for visualizing multi-label images on a 2-D space. This method extends the use of the well-known t-distributed stochastic embedding (t-SNE) algorithm to the case of multi-labels instances, where the concept of partial relevance plays an important role. Furthermore, it is applicable straightaway for weakly annotated data. We apply our approach to generate 2-D representations of Mayan glyph-blocks, which are groups of individual glyph-signs expressing full sentences. The resulting representations are used to place visual instances in a 2-D space with the purpose of providing a browsable catalog for further epigraphic studies, where nearby instances are similar both in semantic and visual terms. We evaluate the performance of our approach quantitatively by performing classification and retrieval experiments. Our results show that this approach obtains high performance in both of these tasks.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115521032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}