Northern European Journal of Language Technology最新文献

DANSK: Domain Generalization of Danish Named Entity Recognition DANSK：丹麦语命名实体识别的领域通用化

Northern European Journal of Language Technology Pub Date : 2024-07-23 DOI: 10.3384/nejlt.2000-1533.2024.5249

K. Enevoldsen, Emil Trenckner Jessen, Rebekah Baglini

{"title":"DANSK: Domain Generalization of Danish Named Entity Recognition","authors":"K. Enevoldsen, Emil Trenckner Jessen, Rebekah Baglini","doi":"10.3384/nejlt.2000-1533.2024.5249","DOIUrl":"https://doi.org/10.3384/nejlt.2000-1533.2024.5249","url":null,"abstract":"Named entity recognition is an important application within Danish NLP, essential within both industry and research. However, Danish NER is inhibited by a lack coverage across domains and entity types. As a consequence, no current models are capable of fine-grained named entity recognition, nor have they been evaluated for potential generalizability issues across datasets and domains. To alleviate these limitations, this paper introduces: 1) DANSK: a named entity dataset providing for high-granularity tagging as well as within-domain evaluation of models across a diverse set of domains; 2) and three generalizable models with fine-grained annotation available in DaCy 2.6.0; and 3) an evaluation of current state-of-the-art models’ ability to generalize across domains. The evaluation of existing and new models revealed notable performance discrepancies across domains, which should be addressed within the field. Shortcomings of the annotation quality of the dataset and its impact on model training and evaluation are also discussed. Despite these limitations, we advocate for the use of the new dataset DANSK alongside further work ongeneralizability within Danish NER.","PeriodicalId":201379,"journal":{"name":"Northern European Journal of Language Technology","volume":"121 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141812061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Structured Prediction with Transformer Encoders 利用变压器编码器进行高效结构化预测

Northern European Journal of Language Technology Pub Date : 2024-03-14 DOI: 10.3384/nejlt.2000-1533.2024.4932

Ali Basirat

引用次数: 0

QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian QUA-RC：用于评估乌克兰语阅读理解能力的多选题半合成数据集

Northern European Journal of Language Technology Pub Date : 2023-11-16 DOI: 10.3384/nejlt.2000-1533.2023.4939

M. Zyrianova, Dmytro Kalpakchi

引用次数: 0

Resource papers as registered reports: a proposal 作为注册报告的资源文件:建议

Northern European Journal of Language Technology Pub Date : 2023-07-13 DOI: 10.3384/nejlt.2000-1533.2023.4884

Emiel van Miltenburg

引用次数: 0

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation nl增强器:一个任务敏感的自然语言增强框架

Northern European Journal of Language Technology Pub Date : 2023-04-08 DOI: 10.3384/nejlt.2000-1533.2023.4725

Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahadiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshang Wu, Jascha Narain Sohl-Dickstein, Jinho Choi, Eduard Hovy, Ondřej Dušek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, E. Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sanghyun Han, Fabrice Harel-Canada, Antoine Honoré, Ishan Jindal, Przemyslaw K. Joniak, D. Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey J. Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, V. Marivate, Gerard De Melo, Simon Meoni, Maxine Meyer,

{"title":"NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation","authors":"Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahadiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshang Wu, Jascha Narain Sohl-Dickstein, Jinho Choi, Eduard Hovy, Ondřej Dušek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, E. Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sanghyun Han, Fabrice Harel-Canada, Antoine Honoré, Ishan Jindal, Przemyslaw K. Joniak, D. Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey J. Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, V. Marivate, Gerard De Melo, Simon Meoni, Maxine Meyer, ","doi":"10.3384/nejlt.2000-1533.2023.4725","DOIUrl":"https://doi.org/10.3384/nejlt.2000-1533.2023.4725","url":null,"abstract":"\u0000\u0000\u0000Data augmentation is an important method for evaluating the robustness of and enhancing the diversity of training data for natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based natural language (NL) augmentation framework which supports the creation of transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of NL tasks annotated with noisy descriptive tags. The transformations incorporate noise, intentional and accidental human mistakes, socio-linguistic variation, semantically-valid style, syntax changes, as well as artificial constructs that are unambiguous to humans. We demonstrate the efficacy of NL-Augmenter by using its transformations to analyze the robustness of popular language models. We find different models to be differently challenged on different tasks, with quasi-systematic score decreases. The infrastructure, datacards, and robustness evaluation results are publicly available on GitHub for the benefit of researchers working on paraphrase generation, robustness analysis, and low-resource NLP.\u0000El aumento de datos es un método importante para evaluar la solidez y mejorar la diversidad del entrenamiento datos para modelos de procesamiento de lenguaje natural (NLP). इस लेख में, हम एनएल-ऑगमेंटर का प्रस्ताव करते हैं - एक नया भागी- दारी पूर्वक, पायथन में बनाया गया, लैंग्वेज (एनएल) ऑग्मेंटेशन फ्रेमवर्क जो ट्रांसफॉर्मेशन (डेटा में बदलाव करना) और फीलटर (फीचर्स के अनुसार डेटा का भाग करना) के नीरमान का समर्थन करता है।. 我们描述了NL-Augmenter框架及其初步包含的117种转换和23个过滤器，并大致标注分类了一系列可适配的自然语言任务. این دگرگونی ها شامل نویز، اشتباهات عمدی و تصادفی انسانی، تنوع اجتماعی-زبانی، سبک معنایی معتبر، تغییرات نحوی و همچنین ساختارهای مصنوعی است که برای انسان ها مبهم است. NL-Augmenterpa allin kaynintam qawachiyku, tikrakuyninku- nata servichikuspayku, chaywanmi qawariyku modelos de lenguaje popular nisqapa allin takyasqa kayninta. Kami menemukan model yang berbeda ditantang secara berbeda pada tugas yang berbeda, dengan penurunan skor kuasi-sistematis. Infrastruktur, kartu data, dan hasil evaluasi ketahanan dipublikasikan tersedia secara gratis di GitHub untuk kepentingan para peneliti yang mengerjakan pembuatan parafrase, analisis ketahanan, dan NLP sumber daya rendah.\u0000 \u0000\u0000\u0000","PeriodicalId":201379,"journal":{"name":"Northern European Journal of Language Technology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122294696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmark for Evaluation of Danish Clinical Word Embeddings 丹麦临床词嵌入评价基准

Northern European Journal of Language Technology Pub Date : 2023-03-01 DOI: 10.3384/nejlt.2000-1533.2023.4132

M. Laursen, J. Pedersen, P. Vinholt, R. Hansen, T. Savarimuthu

引用次数: 2

Barriers and enabling factors for error analysis in NLG research NLG研究中误差分析的障碍和促成因素

Northern European Journal of Language Technology Pub Date : 2023-02-21 DOI: 10.3384/nejlt.2000-1533.2023.4529

Emiel van Miltenburg, Miruna Clinciu, Ondrej Dusek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, S. Schoch, Craig Thomson, Luou Wen

引用次数: 3

PARSEME Meets Universal Dependencies: Getting on the Same Page in Representing Multiword Expressions PARSEME满足通用依赖:在表示多词表达式时获得相同的页面

Northern European Journal of Language Technology Pub Date : 2023-02-21 DOI: 10.3384/nejlt.2000-1533.2023.4453

Agata Savary, Sara Stymne, Verginica Barbu Mititelu, Nathan Schneider, Carlos Ramisch, Joakim Nivre

引用次数: 1

Foreword to NEJLT Volume 8, 2022 NEJLT第8卷，2022年前言

Northern European Journal of Language Technology Pub Date : 2022-12-31 DOI: 10.3384/nejlt.2000-1533.2022.4617

Leon Derczynski

引用次数: 0

Part-of-Speech and Morphological Tagging of Algerian Judeo-Arabic 阿尔及利亚犹太-阿拉伯语的词性和词形标注

Northern European Journal of Language Technology Pub Date : 2022-12-14 DOI: 10.3384/nejlt.2000-1533.2022.4315

Ofra Tirosh-Becker, Michal Kessler, Oren M. Becker, Yonatan Belinkov

引用次数: 0