Maxime Portaz, J. Poignant, Mateusz Budnik, Philippe Mulhem, J. Chevallet, Lorraine Goeuriot
{"title":"Construction et évaluation d'un corpus pour la recherche d'instances d'images muséales","authors":"Maxime Portaz, J. Poignant, Mateusz Budnik, Philippe Mulhem, J. Chevallet, Lorraine Goeuriot","doi":"10.24348/coria.2017.5","DOIUrl":"https://doi.org/10.24348/coria.2017.5","url":null,"abstract":"This paper presents two datasets of annotated photos and videos from two museums. The data comes from two different museums : the Musée de Grenoble, with mainly paintings, and the Lyon-Fourvière museum, with Celtic and pre-Roman objects. In total, they contain 4674 annotated images, corresponding to 784 different artworks, and 3h07 of museum visit firstperson videos shot by 5 persons. This dataset can be used as a challenge for image retrieval and video segmentation and annotation tasks. They are freely available to the research community. The images of these collections contain 361 queries on a corpus of 4313 documents. Moreover, 2132 additional images are extracted from the visit videos, allowing to test images from other sources. Three state of the art approaches are processed and tested on these collections. MOTS-CLÉS : Recherche d’instances images, corpus.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124288905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retweeter ou ne pas retweeter : Le dilemme des portails de diffusion d'information temps-réel","authors":"T. Palmer, G. Hubert, Karen Pinel-Sauvagnat","doi":"10.24348/coria.2017.28","DOIUrl":"https://doi.org/10.24348/coria.2017.28","url":null,"abstract":"L'etude des caracteristiques contextuelles a ete largement traitee en Recherche d'Information (RI), mais les applications concretes sur de vrais flux de donnees ne sont pas tres repandues. Dans cet article, notre problematique concerne la decision automatique de retweeter un message. En considerant le centre d'interet d'un utilisateur, nous proposons un modele pour effectuer un filtrage automatique en temps-reel du flux Twitter en utilisant de multiples caracteristiques contextuelles. Le modele separe l'aspect contextuel du contenu du message en lui-meme, tout en conservant une tres grande vitesse d'execution. Des experimentations ont ete realisees sur la collection TREC Microblog 2015. Les resultats montrent que l'integration de caracteristiques de contexte a un impact positif sur l'efficacite du filtrage sans penaliser son efficience.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122064126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graphe de communauté pour la validation de relations dans le cadre de la population de bases de connaissances","authors":"Rashedur Rahman, B. Grau, S. Rosset","doi":"10.24348/coria.2017.37","DOIUrl":"https://doi.org/10.24348/coria.2017.37","url":null,"abstract":"L’extraction de relations entre entites a partir de textes est une etape importante pour \u0000des tâches d’extraction d’information ou de decouverte de connaissances. Les systemes pro- \u0000duisent de nombreux candidats et la tâche de validation de relation consiste a decider si une \u0000relation candidate est correcte ou non en fonction des informations fournies par les systemes. \u0000Dans cet article, nous proposons un nouvel ensemble de traits fondes sur l’analyse des graphes \u0000engendres par les relations entre entites, qui complete ceux provenant d’une analyse linguis- \u0000tique.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125255757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Etienne Papegnies, Vincent Labatut, Richard Dufour, G. Linarès
{"title":"Detection of abusive messages in an on-line community","authors":"Etienne Papegnies, Vincent Labatut, Richard Dufour, G. Linarès","doi":"10.24348/coria.2017.16","DOIUrl":"https://doi.org/10.24348/coria.2017.16","url":null,"abstract":"Moderating user content in online communities is mainly performed manually, and reducing the workload through automatic methods is of great interest. The industry mainly uses basic approaches such as bad words filtering. In this article, we consider the task of automatically determining whether a message is abusive or not. This task is complex, because messages are written in a non-standardized natural language. We propose an original automatic moderation method applied to French, which is based on both traditional tools and a newly proposed context-based feature relying on the modeling of user behavior when reacting to a message. The results obtained during this preliminary study show the potential of the proposed method, in a context of automatic processing or decision support.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124787119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modèle Neuronal de Recherche d'Information Augmenté par une Ressource Sémantique","authors":"Gia-Hung Nguyen, Lynda Tamine, Laure Soulier, Nathalie Bricon-Souf","doi":"10.24348/coria.2017.4","DOIUrl":"https://doi.org/10.24348/coria.2017.4","url":null,"abstract":"De nombreux travaux en recherche d'information (RI) ont montre l'apport de la semantique des mots pour ameliorer l'appariement de document-requete. D'une part, la semantique symbolique derivee de ressources externes permet de representer des entites et leurs relations explicites. D'autre part, la semantique distributionnelle inferee des corpus permet de representer les relations semantiques implicites d'un corpus. Dans cet article, nous proposons de combiner ces deux types de representations semantiques. Ainsi, nous presentons un modele neuronal pour la RI ad-hoc qui exploite les representations semantiques latentes des documents et des requetes en beneficiant des concepts et des relations exprimes au sein d'une ressource externe. Les evaluations sur deux jeux de donnees prouvent l'efficacite de notre modele par rapport aux modeles neuronaux profonds d'appariement de l'etat de l'art. ABSTRACT. In information retrieval task, the words semantic has been recognized as significant mean to improve the document-query matching. First, the symbolic semantics extracted from external resources allows to represent entities and their explicit relations. Second, the distributed semantics inferred from the corpus allows to exploit the implicit relations hidden in a corpus. In this paper, we introduce a neural model that leverages the latent semantic representations of documents and queries by taking advantage of the concepts and relations expressed within an external resource. Experimental results obtained on two datasets indicate our model effectiveness in comparison with state-of-the-art deep neural retrieval models.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128044576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Jean, Sébastien Harispe, Sylvie Ranwez, P. Bellot, Jacky Montmain
{"title":"Étude d'un modèle d'inférence de connaissances à partir de textes","authors":"P. Jean, Sébastien Harispe, Sylvie Ranwez, P. Bellot, Jacky Montmain","doi":"10.24348/coria.2017.10","DOIUrl":"https://doi.org/10.24348/coria.2017.10","url":null,"abstract":"Cet article propose une approche automatisee d’inference de connaissances basee sur l’analyse de relations extraites a partir de textes. Son originalite repose sur la definition d’un cadre tenant compte (i) d’une structuration des objets etudies (e.g. syntagmes nominaux) sous la forme d’un ordre partiel et (ii) de l’exploitation possible d’une connaissance a priori formalisee dans un modele de connaissances de type ontologie (taxonomie). Ce cadre permet notamment de definir des regles de propagation de l’information basees sur la theorie des croyances afin d’inferer de nouvelles connaissances a partir des relations extraites. Bien qu’a portee plus large, notre approche est ici illustree et evaluee au travers de la definition d’un systeme automatique exploitant des textes issus du Web afin de repondre a des questionnaires generes. Nous montrons notamment l’interet de structurer les extractions et le gain apporte par la prise en compte d’une connaissance a priori au sein d’une telle chaine de traitement.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128224118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Test of Cluster Hypothesis Using a Scalable Similarity-Based Agglomerative Hierarchical Clustering Framework","authors":"Xinyu Wang, Julien Ah-Pine, J. Darmont","doi":"10.24348/coria.2017.RJCRI_15","DOIUrl":"https://doi.org/10.24348/coria.2017.RJCRI_15","url":null,"abstract":"The Cluster Hypothesis is the fundamental assumption of using clustering in Information Retrieval. It states that similar documents tend to be relevant to the same query. Past research works extensively test this hypothesis using agglomerative hierarchical clustering (AHC) methods. However, their conclusions are not consistent concerning retrieval effectiveness for a given clustering method. The main limit of these works is the scalability issue of AHC. In this paper, we extend our previous work to a new test of the cluster hypothesis by applying a scalable similarity-based AHC framework. Principally, the input pairwise cosine similarity matrix is sparsified by given threshold values to reduce memory usage and running time. Our experiments show that even when the similarity matrix is largely sparsified, retrieval effectiveness is retained for all tested methods. Moreover, for two clustering methods, complete link and average link, they do not always dominate the other methods as reported in past works.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114507185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prédiction automatique d'emojis sentimentaux","authors":"Gaël Guibon, Magalie Ochs, P. Bellot","doi":"10.24348/coria.2017.24","DOIUrl":"https://doi.org/10.24348/coria.2017.24","url":null,"abstract":"Dans les messageries sociales les emojis sont parmi les principaux vecteurs d'emo-tions et de sentiments des individus. Aujourd'hui, les utilisateurs naviguent dans des biblio-theques contenant souvent des milliers d'emojis pour selectionner celui correspondant a ce qu'ils souhaitent transmettre. Nos travaux visent a developper un systeme de recommandation automatique d'emoji permettant a l'utilisateur d'identifier un panel reduit d'emojis pertinents etant donnee sa conversation en evitant le parcours de bibliotheques consequentes d'emojis. Cette recommandation pouvant permettre a l'utilisateur de requeter les phrases susceptibles de contenir cet emoji, et l'emotion qui y est associee. Pour ce faire, dans un premier temps, notre objectif est de developper un outil permettant de predire automatiquement les emojis d'une phrase a partir d'un modele de classification appris sur un corpus de messagerie sociale conte-nant des emojis. Plusieurs caracteristiques sont considerees pour l'apprentissage telles que le sentiment de l'utilisateur mais aussi son humeur. Dans cet article, nous decrivons l'impact de ces caracteristiques et les performances des modeles resultants. ABSTRACT. Emojis are among the main carriers of emotions and sentiment in social messaging applications. Nowadays users have to scroll down libraries of thousands of emojis in order to select the one they wanted to use. Our work aims to build an emoji automatic recommendation system to avoid scrolling emoji libraries. And which will allow the user to request emojis by the current sentence based on the emotion it conveys. To do so, we first contribute by building an emoji automatic prediction in sentences based on a classification model. This classification model is learned on an informal text messages corpus based on real data containing emojis. Several features are used to train the classifier. Such as the sentiment value of the text and the user's mood. In this paper we describe the features and models impact on the emoji prediction task. MOTS-CLES : Classification multi-etiquette, recommandation d'emoji, analyse de sentiment.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125639067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adeline Granet, E. Morin, H. Mouchère, Solen Quiniou, C. Viard-Gaudin
{"title":"Étude préliminaire de reconnaissance d'écriture sur des documents historiques","authors":"Adeline Granet, E. Morin, H. Mouchère, Solen Quiniou, C. Viard-Gaudin","doi":"10.24348/coria.2017.RJCRI_11","DOIUrl":"https://doi.org/10.24348/coria.2017.RJCRI_11","url":null,"abstract":"Ce travail s'interesse a l'extraction d'informations dans les registres comptables de la Comedie-Italienne du XVIII e siecle. Ces derniers renferment des informations precieuses pour des chercheurs en sciences humaines et sociales qui travaillent sur l'acculturation des acteurs italiens de cette epoque. L'extraction d'informations, dans des documents anciens non encore etudies, est un processus long et complexe qui demande une expertise a chaque etape : detection et segmentation en blocs, lignes ou mots, extraction de caracteristiques, reconnaissance d'ecri-ture manuscrite. Les reseaux de neurones recurrents, de type BLSTM, avec un decodage CTC constituent une des methodes les plus prometteuses en reconnaissance d'ecriture, pour realiser l'etiquetage d'une sequence donnee en entree et produire un resultat de reconnaissance. Cet article presente une etude preliminaire de l'utilisation de ce type de reseau de neurones pour une premiere tâche : la reconnaissance des titres des pieces de theâtre, dans des documents historiques multilingues (francais et italien) utilisant un vocabulaire ferme et essentiellement compose d'entites nommees.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123626299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recherche de conversations dans les réseaux sociaux : modélisation et expérimentations sur Twitter","authors":"Nawal Ould Amer, Philippe Mulhem, Mathias Géry","doi":"10.24348/coria.2015.6","DOIUrl":"https://doi.org/10.24348/coria.2015.6","url":null,"abstract":"La problematique etudiee dans cet article est celle de l'indexation et de la recherche de conversations dans les reseaux sociaux. Une conversation est un ensemble de messages echanges entre utilisateurs, a la suite d'un message initial. La demarche proposee se base sur une modelisation probabiliste, et detaille en particulier l'utilisation d'informations sociales dans le reseau Twitter. Notre proposition est evaluee sur un corpus de conversations contenant plus de 50 000 tweets, et sur un ensemble de 15 requetes tirees pour partie des campagnes TREC Microblog (Lin et Efron, 2013). Les resultats obtenus en combinant les elements de contenu et les elements sociaux sur ce corpus sont statistiquement significativement meilleurs que ceux de notre approche utilisant le contenu seul ainsi que ceux d'une approche a base de BM25.","PeriodicalId":390974,"journal":{"name":"Conférence en Recherche d'Infomations et Applications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116512145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}