F. Plaza-Del-Arco, S. M. J. Zafra, Arturo Montejo Ráez, M. González, L. A. U. López, M. T. M. Valdivia
{"title":"Overview of the EmoEvalEs task on emotion detection for Spanish at IberLEF 2021","authors":"F. Plaza-Del-Arco, S. M. J. Zafra, Arturo Montejo Ráez, M. González, L. A. U. López, M. T. M. Valdivia","doi":"10.26342/2021-67-13","DOIUrl":"https://doi.org/10.26342/2021-67-13","url":null,"abstract":"This work has been partially supported by a grant from Fondo Social Europeo, Administration of the Junta de Andalucia (DOC 01073 and P20 00956-PAIDI 2020), Fondo Europeo de Desarrollo Regional (FEDER), LIVING-LANG project (RTI2018-094653-B-C21) and the Ministry of Science, Innovation and Universities (scholarship [FPI-PRE2019-089310]) from the Spanish Government.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125254713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Taulé, Alejandro Ariza-Casabona, Montserrat Nofre, Enrique Amigó, Paolo Rosso
{"title":"Overview of DETOXIS at IberLEF 2021: DEtection of TOXicity in comments In Spanish","authors":"M. Taulé, Alejandro Ariza-Casabona, Montserrat Nofre, Enrique Amigó, Paolo Rosso","doi":"10.26342/2021-67-18","DOIUrl":"https://doi.org/10.26342/2021-67-18","url":null,"abstract":"In this paper we present the DETOXIS task, DEtection of TOxicity in comments In Spanish, which took place as part of the IberLEF 2021 Workshop on Iberian Languages Evaluation Forum at the SEPLN 2021 Conference. We describe the NewsCom-TOX dataset used for training and testing the systems, the metrics applied for their evaluation and the results obtained by the submitted approaches. We also provide an error analysis of the results of these systems.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121887697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inducción automática de una taxonomía multilingüe de marcadores discursivos: primeros resultados en castellano, inglés, francés, alemán y catalán","authors":"Rogelio Nazar","doi":"10.26342/2021-67-11","DOIUrl":"https://doi.org/10.26342/2021-67-11","url":null,"abstract":"Esta investigacion ha sido financiada por el Gobierno de Chile a traves del Proyecto Fondecyt Regular 1191481: Induccion automatica de taxonomias de marcadores discursivos a partir de corpus multilingues (2019-2021).","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125165596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodrigo Agerri Gascón, Roberto Centeno, María S. Espinosa, Joseba Fernandez de Landa, Álvaro Rodrigo Yuste
{"title":"VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection","authors":"Rodrigo Agerri Gascón, Roberto Centeno, María S. Espinosa, Joseba Fernandez de Landa, Álvaro Rodrigo Yuste","doi":"10.26342/2021-67-15","DOIUrl":"https://doi.org/10.26342/2021-67-15","url":null,"abstract":"This work has been partially supported by the European Social Fund through the Youth Employment Initiative (YEI 2019) and the Spanish Ministry of Science, Innovation and Universities (DeepReading RTI2018-096846-B-C21, MCIU/AEI/FEDER, UE), and by the DeepText project (KK-2020/00088), funded by the Basque Government. Rodrigo Agerri is also funded by the RYC-2017-23647 fellowship.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125908495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harold González-Guerra, Alfredo Simón-Cuevas, J. Ortega, J. A. Olivas
{"title":"Un enfoque semántico en la seleccion de características basadas en léxico para la detección de emociones","authors":"Harold González-Guerra, Alfredo Simón-Cuevas, J. Ortega, J. A. Olivas","doi":"10.26342/2021-67-10","DOIUrl":"https://doi.org/10.26342/2021-67-10","url":null,"abstract":"Este trabajo ha sido parcialmente financiado por el Fondo Europeo de Desarrollo Regional (FEDER), la Junta de Extremadura (GR18135), y el Ministerio de Ciencia, Innovacion y Universidades de Espana, a traves del proyecto SAFER (PID2019-104735RB-C42).","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130030010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U. Corrêa, Leonardo Coelho, Leonardo Pereira dos Santos, L. Freitas
{"title":"Overview of the IDPT Task on Irony Detection in Portuguese at IberLEF 2021","authors":"U. Corrêa, Leonardo Coelho, Leonardo Pereira dos Santos, L. Freitas","doi":"10.26342/2021-67-23","DOIUrl":"https://doi.org/10.26342/2021-67-23","url":null,"abstract":"This paper presents the Task on Irony Detection in Portuguese (IDPT), held within Iberian Languages Evaluation Forum (IberLEF 2021). We asked the participants to develop systems capable of identifying irony in texts. We created two corpora containing tweets and news articles. Twelve teams registered to the task, among which six submitted both predictions and technical reports. The best performing system achieved a Balanced Accuracy (Bacc) value of 0.52 for tweets (Team PiLN) and 0.92 for news (Team BERT4EVER).","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127996248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ander González-Docasal, Aitor García-Pablos, Haritz Arzelus, Aitor Álvarez
{"title":"AutoPunct: A BERT-based Automatic Punctuation and Capitalisation System for Spanish and Basque","authors":"Ander González-Docasal, Aitor García-Pablos, Haritz Arzelus, Aitor Álvarez","doi":"10.26342/2021-67-5","DOIUrl":"https://doi.org/10.26342/2021-67-5","url":null,"abstract":"The raw output of an Automatic Speech Recognition system usually consists in a stream of words without any casing nor punctuation. In order to improve the readability and enable further uses of this output, punctuation and capitalisation have to be included. In this context, we present AutoPunct, a Transformers-based automatic punctuation and capitalisation model that combines both acoustic (i.e. silences duration) and lexical information (the words themselves). We compared its performance with a system based on Bidirectional Recurrent Neural Networks (BRNN) on Basque (a low-resource language) and Spanish, both individually and simultaneously. The result is a system that achieves high accuracy for punctuation and capitalisation in both languages at the same time, with a throughput of several thousand words per second using a standard GPU.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134533092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Víctor Manuel Darriba Bilbao, Yerai Doval, Elmurod Kuriyozov
{"title":"Procesamiento de Expresiones Multipalabra en gallego mediante Aprendizaje Profundo","authors":"Víctor Manuel Darriba Bilbao, Yerai Doval, Elmurod Kuriyozov","doi":"10.26342/2021-67-4","DOIUrl":"https://doi.org/10.26342/2021-67-4","url":null,"abstract":"Este trabajo ha sido parcialmente financiado por la Xunta de Galicia, a traves del Convenio de colaboracion plurianual entre el Centro Ramon Pineiro para la Investigacion en Humanidades y la Universidad de Vigo, y la Ayuda para la Consolidacion y Estructuracion de Unidades de Investigacion Competitivas ED431C 2018/50, y por el Ministerio de Economia, Industria y Competitividad a traves del proyecto TIN2017-85160-C2-2-R.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"400 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116230702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Asier Gutiérrez-Fandiño, Jordi Armengol-Estap'e, Marc Pàmies, Joan Llop-Palao, Joaquín Silveira-Ocampo, C. Carrino, Carme Armentano-Oller, C. R. Penagos, Aitor Gonzalez-Agirre, Marta Villegas
{"title":"MarIA: Spanish Language Models","authors":"Asier Gutiérrez-Fandiño, Jordi Armengol-Estap'e, Marc Pàmies, Joan Llop-Palao, Joaquín Silveira-Ocampo, C. Carrino, Carme Armentano-Oller, C. R. Penagos, Aitor Gonzalez-Agirre, Marta Villegas","doi":"10.26342/2022-68-3","DOIUrl":"https://doi.org/10.26342/2022-68-3","url":null,"abstract":"This work presents MarIA, a family of Spanish language models and associated resources made available to the industry and the research community. Currently, MarIA includes RoBERTa-base, RoBERTa-large, GPT2 and GPT2-large Spanish language models, which can arguably be presented as the largest and most proficient language models in Spanish. The models were pretrained using a massive corpus of 570GB of clean and deduplicated texts with 135 billion words extracted from the Spanish Web Archive crawled by the National Library of Spain between 2009 and 2019. We assessed the performance of the models with nine existing evaluation datasets and with a novel extractive Question Answering dataset created ex novo. Overall, MarIA models outperform the existing Spanish models across a variety of NLU tasks and training settings.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126281048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Vilares, Marcos Garcia, Carlos Gómez-Rodríguez
{"title":"Bertinho: Galician BERT Representations","authors":"David Vilares, Marcos Garcia, Carlos Gómez-Rodríguez","doi":"10.26342/2021-66-1","DOIUrl":"https://doi.org/10.26342/2021-66-1","url":null,"abstract":"This paper presents a monolingual BERT model for Galician. We follow the recent trend that shows that it is feasible to build robust monolingual BERT models even for relatively low-resource languages, while performing better than the well-known official multilingual BERT (mBERT). More particularly, we release two monolingual Galician BERT models, built using 6 and 12 transformer layers, respectively; trained with limited resources (~45 million tokens on a single GPU of 24GB). We then provide an exhaustive evaluation on a number of tasks such as POS-tagging, dependency parsing and named entity recognition. For this purpose, all these tasks are cast in a pure sequence labeling setup in order to run BERT without the need to include any additional layers on top of it (we only use an output classification layer to map the contextualized representations into the predicted label). The experiments show that our models, especially the 12-layer one, outperform the results of mBERT in most tasks.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131203482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}