Rocío Aznar-Gimeno, María del Carmen Rodríguez-Hernández, R. del-Hoyo-Alonso, S. Ilarri
{"title":"Towards a Structured Representation of Results in an Information Retrieval System for Public Examination Calls","authors":"Rocío Aznar-Gimeno, María del Carmen Rodríguez-Hernández, R. del-Hoyo-Alonso, S. Ilarri","doi":"10.1145/3230599.3230604","DOIUrl":null,"url":null,"abstract":"Nowadays, the huge amount of information available may easily overwhelm users. Information Retrieval techniques can help the user to find what he/she needs, but there are still challenges to solve within this research area. An example is the problem of minimizing the user's search time to find specific information in unstructured texts within the retrieved documents, in different application domains. The use of supervised learning-based information extraction techniques can be a solution to this problem. However, a supervised learning model requires as input a large labeled dataset, generated manually by experts. Moreover, there are currently very few information extraction frameworks that allow to reduce or avoid the human effort needed to label such training datasets. In this paper, we present our work in progress towards the development of an information retrieval system that will display structured, centralized and updated information extracted from documents corresponding to calls for public examinations. In this scenario, the search engine should be able not only to display the documents relevant to the user's query, but also specific data contained in the documents. In addition, we present a study of frameworks that can be used in this context as well as our preliminary experience with the use of the Snorkel framework. In the future, we plan to complete our proposal and also extend it for other types of documents published in Spanish official bulletins.","PeriodicalId":448209,"journal":{"name":"Proceedings of the 5th Spanish Conference on Information Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Spanish Conference on Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3230599.3230604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, the huge amount of information available may easily overwhelm users. Information Retrieval techniques can help the user to find what he/she needs, but there are still challenges to solve within this research area. An example is the problem of minimizing the user's search time to find specific information in unstructured texts within the retrieved documents, in different application domains. The use of supervised learning-based information extraction techniques can be a solution to this problem. However, a supervised learning model requires as input a large labeled dataset, generated manually by experts. Moreover, there are currently very few information extraction frameworks that allow to reduce or avoid the human effort needed to label such training datasets. In this paper, we present our work in progress towards the development of an information retrieval system that will display structured, centralized and updated information extracted from documents corresponding to calls for public examinations. In this scenario, the search engine should be able not only to display the documents relevant to the user's query, but also specific data contained in the documents. In addition, we present a study of frameworks that can be used in this context as well as our preliminary experience with the use of the Snorkel framework. In the future, we plan to complete our proposal and also extend it for other types of documents published in Spanish official bulletins.