Proces. del Leng. Natural最新文献

Multilingual Controllable Transformer-Based Lexical Simplification 基于多语种可控变换的词汇简化

Proces. del Leng. Natural Pub Date : 2023-07-05 DOI: 10.48550/arXiv.2307.02120

Sheang Cheng Kim, Horacio Saggion

{"title":"Multilingual Controllable Transformer-Based Lexical Simplification","authors":"Sheang Cheng Kim, Horacio Saggion","doi":"10.48550/arXiv.2307.02120","DOIUrl":"https://doi.org/10.48550/arXiv.2307.02120","url":null,"abstract":"Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper proposes mTLS, a multilingual controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model. The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pre-trained masked language models to learn simpler alternatives for complex words. The evaluation results on three well-known LS datasets -- LexMTurk, BenchLS, and NNSEval -- show that our model outperforms the previous state-of-the-art models like LSBert and ConLS. Moreover, further evaluation of our approach on the part of the recent TSAR-2022 multilingual LS shared-task dataset shows that our model performs competitively when compared with the participating systems for English LS and even outperforms the GPT-3 model on several metrics. Moreover, our model obtains performance gains also for Spanish and Portuguese.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131319012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Learning Methods for Extracting Metaphorical Names of Flowers and Plants 花卉和植物隐喻名称提取的深度学习方法

Proces. del Leng. Natural Pub Date : 2023-05-18 DOI: 10.48550/arXiv.2305.10833

A. Haddad, Damith Premasiri, Tharindu Ranasinghe, R. Mitkov

引用次数: 1

Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning 使用强化学习将英语的中型GPT模型与西班牙语的小型封闭域对齐

Proces. del Leng. Natural Pub Date : 2023-03-30 DOI: 10.48550/arXiv.2303.17649

Oscar R. Navarrete-Parra, Víctor Uc Cetina, Jorge Reyes-Magaña

引用次数: 0

Número 61 61号

Proces. del Leng. Natural Pub Date : 2022-12-28 DOI: 10.18537/auc.61

VV.AA

引用次数: 0

Lessons learned from the evaluation of Spanish Language Models 西班牙语语言模式评估的经验教训

Proces. del Leng. Natural Pub Date : 2022-12-16 DOI: 10.48550/arXiv.2212.08390

Rodrigo Agerri, Eneko Agirre

{"title":"Lessons learned from the evaluation of Spanish Language Models","authors":"Rodrigo Agerri, Eneko Agirre","doi":"10.48550/arXiv.2212.08390","DOIUrl":"https://doi.org/10.48550/arXiv.2212.08390","url":null,"abstract":"Given the impact of language models on the field of Natural Language Processing, a number of Spanish encoder-only masked language models (aka BERTs) have been trained and released. These models were developed either within large projects using very large private corpora or by means of smaller scale academic efforts leveraging freely available data. In this paper we present a comprehensive head-to-head comparison of language models for Spanish with the following results: (i) Previously ignored multilingual models from large companies fare better than monolingual models, substantially changing the evaluation landscape of language models in Spanish; (ii) Results across the monolingual models are not conclusive, with supposedly smaller and inferior models performing competitively. Based on these empirical results, we argue for the need of more research to understand the factors underlying them. In this sense, the effect of corpus size, quality and pre-training techniques need to be further investigated to be able to obtain Spanish monolingual models significantly better than the multilingual ones released by large private companies, specially in the face of rapid ongoing progress in the field. The recent activity in the development of language technology for Spanish is to be welcomed, but our results show that building language models remains an open, resource-heavy problem which requires to marry resources (monetary and/or computational) with the best research expertise and practice.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123269857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling BERTIN:使用困惑采样的西班牙语模型的有效预训练

Proces. del Leng. Natural Pub Date : 2022-07-14 DOI: 10.48550/arXiv.2207.06814

Javier de la Rosa, E. G. Ponferrada, Paulo Villegas, Pablo González de Prado Salas, Manu Romero, María Grandury

引用次数: 44

Número 60 60号

Proces. del Leng. Natural Pub Date : 2021-12-17 DOI: 10.18537/auc.60

Vvaa

引用次数: 0

NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts NLP应用于职业健康:MEDDOPROF在IberLEF 2021上分享了关于从医学文本中自动识别、分类和规范专业和职业的任务

Proces. del Leng. Natural Pub Date : 2021-09-06 DOI: 10.26342/2021-67-21

Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Briva-Iglesias, Martin Krallinger

{"title":"NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts","authors":"Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Briva-Iglesias, Martin Krallinger","doi":"10.26342/2021-67-21","DOIUrl":"https://doi.org/10.26342/2021-67-21","url":null,"abstract":"Among the socio-demographic patient characteristics, occupations play an important role regarding not only occupational health, work-related accidents and exposure to toxic/pathogenic agents, but also their impact on general physical and mental health. This paper presents the Medical Documents Profession Recogni-tion (MEDDOPROF) shared task (held within IberLEF/SEPLN 2021), focused on the recognition and normalization of occupations in medical documents in Spanish. MEDDOPROF proposes three challenges: NER (recognition of professions, employ-ment statuses and activities in text), CLASS (classifying each occupation mention to its holder, i.e. patient or family member) and NORM (normalizing mentions to their identiﬁer in ESCO or SNOMED CT). From the total of 40 registered teams, 15 submitted a total of 94 runs for the various sub-tracks. Best-performing systems were based on deep-learning technologies (incl. transformers) and achieved 0.818 F-score in occupation detection (NER), 0.793 in classifying occupations to their ref-erent (CLASS) and 0.619 in normalization (NORM). Future initiatives should also address multilingual aspects and application to other domains like social services, human resources, legal or job market data analytics and policy makers.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"107 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

El sentimiento de las letras de las canciones y su relación con las características musicales 歌曲歌词的感觉及其与音乐特征的关系

Proces. del Leng. Natural Pub Date : 2021-09-06 DOI: 10.26342/2021-67-8

M. Palomeque, J. Lucio

引用次数: 1

Impact of Text Length for Information Retrieval Tasks based on Probabilistic Topics 文本长度对基于概率主题的信息检索任务的影响

Proces. del Leng. Natural Pub Date : 2021-09-06 DOI: 10.26342/2021-67-2

Carlos Badenes-Olmedo, Borja Lozano-Álvarez, Óscar Corcho

引用次数: 0