Proces. del Leng. Natural最新文献

筛选
英文 中文
Multilingual Controllable Transformer-Based Lexical Simplification 基于多语种可控变换的词汇简化
Proces. del Leng. Natural Pub Date : 2023-07-05 DOI: 10.48550/arXiv.2307.02120
Sheang Cheng Kim, Horacio Saggion
{"title":"Multilingual Controllable Transformer-Based Lexical Simplification","authors":"Sheang Cheng Kim, Horacio Saggion","doi":"10.48550/arXiv.2307.02120","DOIUrl":"https://doi.org/10.48550/arXiv.2307.02120","url":null,"abstract":"Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper proposes mTLS, a multilingual controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model. The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pre-trained masked language models to learn simpler alternatives for complex words. The evaluation results on three well-known LS datasets -- LexMTurk, BenchLS, and NNSEval -- show that our model outperforms the previous state-of-the-art models like LSBert and ConLS. Moreover, further evaluation of our approach on the part of the recent TSAR-2022 multilingual LS shared-task dataset shows that our model performs competitively when compared with the participating systems for English LS and even outperforms the GPT-3 model on several metrics. Moreover, our model obtains performance gains also for Spanish and Portuguese.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131319012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Methods for Extracting Metaphorical Names of Flowers and Plants 花卉和植物隐喻名称提取的深度学习方法
Proces. del Leng. Natural Pub Date : 2023-05-18 DOI: 10.48550/arXiv.2305.10833
A. Haddad, Damith Premasiri, Tharindu Ranasinghe, R. Mitkov
{"title":"Deep Learning Methods for Extracting Metaphorical Names of Flowers and Plants","authors":"A. Haddad, Damith Premasiri, Tharindu Ranasinghe, R. Mitkov","doi":"10.48550/arXiv.2305.10833","DOIUrl":"https://doi.org/10.48550/arXiv.2305.10833","url":null,"abstract":"The domain of Botany is rich with metaphorical terms. Those terms play an important role in the description and identification of flowers and plants. However, the identification of such terms in discourse is an arduous task. This leads in some cases to committing errors during translation processes and lexicographic tasks. The process is even more challenging when it comes to machine translation, both in the cases of single-word terms and multi-word terms. One of the recent concerns of Natural Language Processing (NLP) applications and Machine Translation (MT) technologies is the automatic identification of metaphor-based words in discourse through Deep Learning (DL). In this study, we seek to fill this gap through the use of thirteen popular transformer based models, as well as ChatGPT, and we show that discriminative models perform better than GPT-3.5 model with our best performer reporting 92.2349% F1 score in metaphoric flower and plant names identification task.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132795262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning 使用强化学习将英语的中型GPT模型与西班牙语的小型封闭域对齐
Proces. del Leng. Natural Pub Date : 2023-03-30 DOI: 10.48550/arXiv.2303.17649
Oscar R. Navarrete-Parra, Víctor Uc Cetina, Jorge Reyes-Magaña
{"title":"Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning","authors":"Oscar R. Navarrete-Parra, Víctor Uc Cetina, Jorge Reyes-Magaña","doi":"10.48550/arXiv.2303.17649","DOIUrl":"https://doi.org/10.48550/arXiv.2303.17649","url":null,"abstract":"In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"7 Suppl 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124983406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Número 61 61号
Proces. del Leng. Natural Pub Date : 2022-12-28 DOI: 10.18537/auc.61
VV.AA
{"title":"Número 61","authors":"VV.AA","doi":"10.18537/auc.61","DOIUrl":"https://doi.org/10.18537/auc.61","url":null,"abstract":"\u0000Ejemplar completo del número 61 \u0000 \u0000 ","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126464338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lessons learned from the evaluation of Spanish Language Models 西班牙语语言模式评估的经验教训
Proces. del Leng. Natural Pub Date : 2022-12-16 DOI: 10.48550/arXiv.2212.08390
Rodrigo Agerri, Eneko Agirre
{"title":"Lessons learned from the evaluation of Spanish Language Models","authors":"Rodrigo Agerri, Eneko Agirre","doi":"10.48550/arXiv.2212.08390","DOIUrl":"https://doi.org/10.48550/arXiv.2212.08390","url":null,"abstract":"Given the impact of language models on the field of Natural Language Processing, a number of Spanish encoder-only masked language models (aka BERTs) have been trained and released. These models were developed either within large projects using very large private corpora or by means of smaller scale academic efforts leveraging freely available data. In this paper we present a comprehensive head-to-head comparison of language models for Spanish with the following results: (i) Previously ignored multilingual models from large companies fare better than monolingual models, substantially changing the evaluation landscape of language models in Spanish; (ii) Results across the monolingual models are not conclusive, with supposedly smaller and inferior models performing competitively. Based on these empirical results, we argue for the need of more research to understand the factors underlying them. In this sense, the effect of corpus size, quality and pre-training techniques need to be further investigated to be able to obtain Spanish monolingual models significantly better than the multilingual ones released by large private companies, specially in the face of rapid ongoing progress in the field. The recent activity in the development of language technology for Spanish is to be welcomed, but our results show that building language models remains an open, resource-heavy problem which requires to marry resources (monetary and/or computational) with the best research expertise and practice.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123269857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling BERTIN:使用困惑采样的西班牙语模型的有效预训练
Proces. del Leng. Natural Pub Date : 2022-07-14 DOI: 10.48550/arXiv.2207.06814
Javier de la Rosa, E. G. Ponferrada, Paulo Villegas, Pablo González de Prado Salas, Manu Romero, María Grandury
{"title":"BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling","authors":"Javier de la Rosa, E. G. Ponferrada, Paulo Villegas, Pablo González de Prado Salas, Manu Romero, María Grandury","doi":"10.48550/arXiv.2207.06814","DOIUrl":"https://doi.org/10.48550/arXiv.2207.06814","url":null,"abstract":"The pre-training of large language models usually requires massive amounts of resources, both in terms of computation and data. Frequently used web sources such as Common Crawl might contain enough noise to make this pre-training sub-optimal. In this work, we experiment with different sampling methods from the Spanish version of mC4, and present a novel data-centric technique which we name $textit{perplexity sampling}$ that enables the pre-training of language models in roughly half the amount of steps and using one fifth of the data. The resulting models are comparable to the current state-of-the-art, and even achieve better results for certain tasks. Our work is proof of the versatility of Transformers, and paves the way for small teams to train their models on a limited budget. Our models are available at this $href{https://huggingface.co/bertin-project}{URL}$.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126478521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Número 60 60号
Proces. del Leng. Natural Pub Date : 2021-12-17 DOI: 10.18537/auc.60
Vvaa
{"title":"Número 60","authors":"Vvaa","doi":"10.18537/auc.60","DOIUrl":"https://doi.org/10.18537/auc.60","url":null,"abstract":"Ejemplar completo del número 60","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133947108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts NLP应用于职业健康:MEDDOPROF在IberLEF 2021上分享了关于从医学文本中自动识别、分类和规范专业和职业的任务
Proces. del Leng. Natural Pub Date : 2021-09-06 DOI: 10.26342/2021-67-21
Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Briva-Iglesias, Martin Krallinger
{"title":"NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts","authors":"Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Briva-Iglesias, Martin Krallinger","doi":"10.26342/2021-67-21","DOIUrl":"https://doi.org/10.26342/2021-67-21","url":null,"abstract":"Among the socio-demographic patient characteristics, occupations play an important role regarding not only occupational health, work-related accidents and exposure to toxic/pathogenic agents, but also their impact on general physical and mental health. This paper presents the Medical Documents Profession Recogni-tion (MEDDOPROF) shared task (held within IberLEF/SEPLN 2021), focused on the recognition and normalization of occupations in medical documents in Spanish. MEDDOPROF proposes three challenges: NER (recognition of professions, employ-ment statuses and activities in text), CLASS (classifying each occupation mention to its holder, i.e. patient or family member) and NORM (normalizing mentions to their identifier in ESCO or SNOMED CT). From the total of 40 registered teams, 15 submitted a total of 94 runs for the various sub-tracks. Best-performing systems were based on deep-learning technologies (incl. transformers) and achieved 0.818 F-score in occupation detection (NER), 0.793 in classifying occupations to their ref-erent (CLASS) and 0.619 in normalization (NORM). Future initiatives should also address multilingual aspects and application to other domains like social services, human resources, legal or job market data analytics and policy makers.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"107 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
El sentimiento de las letras de las canciones y su relación con las características musicales 歌曲歌词的感觉及其与音乐特征的关系
Proces. del Leng. Natural Pub Date : 2021-09-06 DOI: 10.26342/2021-67-8
M. Palomeque, J. Lucio
{"title":"El sentimiento de las letras de las canciones y su relación con las características musicales","authors":"M. Palomeque, J. Lucio","doi":"10.26342/2021-67-8","DOIUrl":"https://doi.org/10.26342/2021-67-8","url":null,"abstract":"Los autores agradecen la financiacion recibida por la Comunidad de Madrid y la UAH (ref: EPU-INV/2020/006).","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"305 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121260731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Impact of Text Length for Information Retrieval Tasks based on Probabilistic Topics 文本长度对基于概率主题的信息检索任务的影响
Proces. del Leng. Natural Pub Date : 2021-09-06 DOI: 10.26342/2021-67-2
Carlos Badenes-Olmedo, Borja Lozano-Álvarez, Óscar Corcho
{"title":"Impact of Text Length for Information Retrieval Tasks based on Probabilistic Topics","authors":"Carlos Badenes-Olmedo, Borja Lozano-Álvarez, Óscar Corcho","doi":"10.26342/2021-67-2","DOIUrl":"https://doi.org/10.26342/2021-67-2","url":null,"abstract":"This work is supported by the project KnowledgeSpaces with reference PID2020-118274RB-I00, financed by the Spanish Ministry of Science and Innovation.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114758610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信