{"title":"Vandalism on Collaborative Web Communities: An Exploration of Editorial Behaviour in Wikipedia","authors":"Abdulwhab Alkharashi, J. Jose","doi":"10.1145/3230599.3230608","DOIUrl":"https://doi.org/10.1145/3230599.3230608","url":null,"abstract":"Modern online discussion communities allow people to contribute, sometimes anonymously. Such flexibility sometimes threatens the reputation and reliability of community-owned resources. Such flexibility is understandable, however, they engender threats to the reputation and reliability in collective goods. Since not a lot of previous work addressed these issues it is important to study the aforementioned issues to build an innate understanding of recent ongoing vandalism of Wikipedia pages and ways to preventing those. In this study, we consider the type of activity that the anonymous users carry out on Wikipedia and also contemplate how others react to their activities. In particular, we want to study vandalism of Wikipedia pages and ways of preventing this kind of activity. Our preliminary analysis reveals (~ 90%) of the vandalism or foul edits are done by unregistered users in Wikipedia due to nature of openness. The community reaction seemed to be immediate: most vandalisms were reverted within five minutes on an average. Further analysis shed light on the tolerance of Wikipedia community, reliability of anonymous users revisions and feasibility of early prediction of vandalism.","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121923126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Barriers for the access to knowledge models in Linked Data cultural heritage collections","authors":"Dayany Díaz-Corona, J. Lacasta, J. Nogueras-Iso","doi":"10.1145/3230599.3230615","DOIUrl":"https://doi.org/10.1145/3230599.3230615","url":null,"abstract":"During last years governments have promoted the digitisation and on-line access to cultural heritage due to its importance for supporting the knowledge economy. Initiatives like Europeana or UNESCO Memory of the World Programme have encouraged the creation of cultural heritage repositories, which currently contain millions of digitized items. Many of these repositories have adopted semantic web technologies and Linked Data approaches for their publication. However, the way in which they are described does not follow in many cases the best practices in the field. This work details the problems identified when analyzing the way the cultural heritage resources are classified in these semantic repositories. It specifically focuses on the content of the thematic annotation of these resources, but many of the identified problems can be extrapolated to other descriptors.","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130472845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of Wikipedia categories on information retrieval research: a brief review","authors":"Jesús Tramullas, Piedad Garrido Picazo, Ana-Isabel Sánchez-Casabón","doi":"10.1145/3230599.3230617","DOIUrl":"https://doi.org/10.1145/3230599.3230617","url":null,"abstract":"Wikipedia categories, a classification scheme built for organizing and describing Wikpedia articles, are being applied in computer science research. This paper adopts a systematic literature review approach, in order to identify different approaches and uses of Wikipedia categories in information retrieval research. Several types of work are identified, depending on the intrinsic study of the categories structure, or its use as a tool for the processing and analysis of other documentary corpus different to Wikipedia. Information retrieval is identified as one of the major areas of use, in particular its application in the refinement and improvement of search expressions, and the construction of textual corpus. However, the set of available works shows that in many cases research approaches applied and results obtained can be integrated into a comprehensive and inclusive concept of information retrieval.","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131727541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 5th Spanish Conference on Information Retrieval","authors":"","doi":"10.1145/3230599","DOIUrl":"https://doi.org/10.1145/3230599","url":null,"abstract":"","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121072708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Text Summarization based on Betweenness Centrality","authors":"Gretel Liz De la Peña Sarracén, Paolo Rosso","doi":"10.1145/3230599.3230611","DOIUrl":"https://doi.org/10.1145/3230599.3230611","url":null,"abstract":"Automatic text summary plays an important role in information retrieval. With a large volume of information, presenting the user only a summary greatly facilitates the search work of the most relevant. Therefore, this task can provide a solution to the problem of information overload. Automatic text summary is a process of automatically creating a compressed version of a certain text that provides useful information for users. This article presents an unsupervised extractive approach based on graphs. The method constructs an indirected weighted graph from the original text by adding a vertex for each sentence, and calculates a weighted edge between each pair of sentences that is based on a similarity/dissimilarity criterion. The main contribution of the work is that we do a study of the impact of a known algorithm for the social network analysis, which allows to analyze large graphs efficiently. As a measure to select the most relevant sentences, we use betweenness centrality. The method was evaluated in an open reference data set of DUC2002 with Rouge scores.","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125648621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
César Albusac, L. M. D. Campos, J. M. Fernández-Luna, J. Huete
{"title":"Content-based recommendation for Academic Expert finding","authors":"César Albusac, L. M. D. Campos, J. M. Fernández-Luna, J. Huete","doi":"10.1145/3230599.3230607","DOIUrl":"https://doi.org/10.1145/3230599.3230607","url":null,"abstract":"Nowadays it is more and more frequent that Web users search for professionals in order to find people who can help solve any problem in a given field. This is call expert finding. A particular case is when users are interested in scientific researchers. The associated problem is to get, given a query that expresses a topic of interest for a user, a set of researchers who are expert on it. One of the difficulties to tackle the problem is to indentify the topics in which a professional is expert. In this paper, we face this problem from a content-based recommendatation perspective and we present a method where, starting from the articles published by each researcher, and a query, the expert researchers are obtained. We also present a new document collection, called PMSC-UGR, specifically designed for the evaluation in the field of expert finding and document filtering","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133100632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rocío Aznar-Gimeno, María del Carmen Rodríguez-Hernández, R. del-Hoyo-Alonso, S. Ilarri
{"title":"Towards a Structured Representation of Results in an Information Retrieval System for Public Examination Calls","authors":"Rocío Aznar-Gimeno, María del Carmen Rodríguez-Hernández, R. del-Hoyo-Alonso, S. Ilarri","doi":"10.1145/3230599.3230604","DOIUrl":"https://doi.org/10.1145/3230599.3230604","url":null,"abstract":"Nowadays, the huge amount of information available may easily overwhelm users. Information Retrieval techniques can help the user to find what he/she needs, but there are still challenges to solve within this research area. An example is the problem of minimizing the user's search time to find specific information in unstructured texts within the retrieved documents, in different application domains. The use of supervised learning-based information extraction techniques can be a solution to this problem. However, a supervised learning model requires as input a large labeled dataset, generated manually by experts. Moreover, there are currently very few information extraction frameworks that allow to reduce or avoid the human effort needed to label such training datasets. In this paper, we present our work in progress towards the development of an information retrieval system that will display structured, centralized and updated information extracted from documents corresponding to calls for public examinations. In this scenario, the search engine should be able not only to display the documents relevant to the user's query, but also specific data contained in the documents. In addition, we present a study of frameworks that can be used in this context as well as our preliminary experience with the use of the Snorkel framework. In the future, we plan to complete our proposal and also extend it for other types of documents published in Spanish official bulletins.","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125341038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New approaches for evaluation: correctness and freshness: Extended Abstract","authors":"Pablo Sánchez, Rus M. Mesas, Alejandro Bellogín","doi":"10.1145/3230599.3230614","DOIUrl":"https://doi.org/10.1145/3230599.3230614","url":null,"abstract":"The main goal of a Recommender System is to suggest relevant items to users, although other utility dimensions -- such as diversity, novelty, confidence, possibility of providing explanations -- are often considered. In this work, we study two dimensions that have been neglected so far in the literature: coverage and temporal novelty. On the one hand, we present a family of metrics that combine precision and coverage in a principled manner (correctness); on the other hand, we provide a measure to account for how much a system is promoting fresh items in its recommendations (freshness). Empirical results show the usefulness of these new metrics to capture more nuances of the recommendation quality.","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"85 36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130348981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"About BIRDS project (Bioinformatics and Information Retrieval Data Structures Analysis and Design)","authors":"Guillermo de Bernardo, Susana Ladra","doi":"10.1145/3230599.3230602","DOIUrl":"https://doi.org/10.1145/3230599.3230602","url":null,"abstract":"BIRDS stands for \"Bioinformatics and Information Retrieval Data Structures analysis and design\" and is a 4-year project (2016--2019) that has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 690941. The overall goal of BIRDS is to establish a long term international network involving leading researchers in the development of efficient data structures in the fields of Bioinformatics and Information Retrieval, to strengthen the partnership through the exchange of knowledge and expertise, and to develop integrated approaches to improve current approaches in both fields. The research will address challenges in storing, processing, indexing, searching and navigating genome-scale data by designing new algorithms and data structures for sequence analysis, networks representation or compressing and indexing repetitive data. BIRDS project is carried out by 7 research institutions from Australia (University of Melbourne), Chile (University of Chile and University of Conceptión), Finland (University of Helsinki), Japan (Kyushu University), Portugal (Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa, INESC-ID), and Spain (University of A Coruña), and a Spanish SME (Enxenio S.L.). It is coordinated by the University of A Coruña (Spain).","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134137357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building Python-Based Topologies for Massive Processing of Social Media Data in Real Time","authors":"Rodrigo Martínez-Castaño, J. C. Pichel, D. Losada","doi":"10.1145/3230599.3230618","DOIUrl":"https://doi.org/10.1145/3230599.3230618","url":null,"abstract":"In this paper we propose a streaming approach for real-time processing of huge amounts of data. CATENAE is a library for easy building and execution of Python topologies (e.g., web crawler, classifier). Topologies are designed for their deployment inside Docker containers and, thus, horizontal scaling, granular resource assignment and isolation can be achieved easily. Furthermore, micromodules can have its own dependencies (including the Python version), allowing the user to limit resources such as CPU or memory by instance. We describe an implementation of a use case composed of two topologies: (1) a crawler for tracking users in social media and (2) an early risk detector of depression. We also explain how CATENAE topologies can be connected to non-Python systems.","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125871322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}