Computational Linguistics最新文献_第6页

Improved N-Best Extraction with an Evaluation on Language Data 改进的基于语言数据评估的N-Best提取

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-12-16 DOI: 10.1162/coli_a_00427

Johanna Björklund, F. Drewes, Anna Jonsson

引用次数: 1

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems 重新审视会话对话系统时代的ASR与NLU之间的界限

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-12-10 DOI: 10.1162/coli_a_00430

Manaal Faruqui, Dilek Z. Hakkani-Tür

引用次数: 11

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP 增广还是不增广?低资源自然语言处理中文本增强技术的比较研究

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-11-18 DOI: 10.1162/coli_a_00425

Gözde Gül Şahin

{"title":"To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP","authors":"Gözde Gül Şahin","doi":"10.1162/coli_a_00425","DOIUrl":"https://doi.org/10.1162/coli_a_00425","url":null,"abstract":"Abstract Data-hungry deep neural networks have established themselves as the de facto standard for many NLP tasks, including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind their statistical counterparts in low-resource scenarios. One methodology to counterattack this problem is text augmentation, that is, generating new synthetic training data points from existing data. Although NLP has recently witnessed several new textual augmentation techniques, the field still lacks a systematic performance analysis on a diverse set of languages and sequence tagging tasks. To fill this gap, we investigate three categories of text augmentation methodologies that perform changes on the syntax (e.g., cropping sub-sentences), token (e.g., random word insertion), and character (e.g., character swapping) levels. We systematically compare the methods on part-of-speech tagging, dependency parsing, and semantic role labeling for a diverse set of language families using various models, including the architectures that rely on pretrained multilingual contextualized language models such as mBERT. Augmentation most significantly improves dependency parsing, followed by part-of-speech tagging and semantic role labeling. We find the experimented techniques to be effective on morphologically rich languages in general rather than analytic languages such as Vietnamese. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT, especially for dependency parsing. We identify the character-level methods as the most consistent performers, while synonym replacement and syntactic augmenters provide inconsistent improvements. Finally, we discuss that the results most heavily depend on the task, language pair (e.g., syntactic-level techniques mostly benefit higher-level tasks and morphologically richer languages), and model type (e.g., token-level augmentation provides significant improvements for BPE, while character-level ones give generally higher scores for char and mBERT based models).","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"5-42"},"PeriodicalIF":9.3,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49107971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Natural Language Processing and Computational Linguistics 自然语言处理与计算语言学

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-10-18 DOI: 10.1162/coli_a_00420

Jun'ichi Tsujii

{"title":"Natural Language Processing and Computational Linguistics","authors":"Jun'ichi Tsujii","doi":"10.1162/coli_a_00420","DOIUrl":"https://doi.org/10.1162/coli_a_00420","url":null,"abstract":"away other aspects of information, such as the speaker’s empathy, distinction of old/new information, emphasis, and so on. To climb up the hierarchy led to loss of information in lower levels of representation. In Tsujii (1986), instead of mapping at the abstract level, I proposed “transfer based on a bundle of features of all the levels”, in which the transfer would refer to all levels of representation in the source language to produce a corresponding representation in the target language (Figure 4). Because different levels of representation require different geometrical structures (i.e., different tree structures), the realization of this proposal had to wait for development of a clear mathematical formulation of feature-based 6 IS (Interface Structure) is dependent on a specific language. In particular, unlike the interlingual approach, Eurotra did not assume language-independent leximemes in ISs so that the transfer phase between the two ISs (source and target ISs) was indispensable. See footnote 5. 711 D ow naded rom httpdirect.m it.edu/coli/article-p7/1979478/coli_a_00420.pdf by gest on 04 M arch 2022 Computational Linguistics Volume 47, Number 4 Figure 4 Description-based transfer (Tsujii 1986). representation with reentrancy, which allowed multiple levels (i.e., multiple trees) to be represented with their mutual relationships (see the next section). Another idea we adopted to systematize the transfer phase was recursive transfer (Nagao and Tsujii 1986), which was inspired by the idea of compositional semantics in CL. According to the views of linguists at the time, a language is an infinite set of expressions which, in turn, is defined by a finite set of rules. By applying this finite number of rules, one can generate infinitely many grammatical sentences of the language. Compositional semantics claimed that the meaning of a phrase was determined by combining the meanings of its subphrases, using the rules that generated the phrase. Compositional translation applied the same idea to translation. That is, the translation of a phrase was determined by combining the translations of its subphrases. In this way, translations of infinitely many sentences of the source language could be generated. Using the compositional translation approach, the translation of a sentence would be undertaken by recursively tracing a tree structure of a source sentence. The translation of a phrase would then be formulated by combining the translations of its subphrases. That is, translation would be constructed in a bottom up manner, from smaller units of translation to larger units. Furthermore, because the mapping of a phrase from the source to the target would be determined by the lexical head of the phrase, the lexical entry for the head word specified how to map a phrase to the target. In the MU project, we called this lexicondriven, recursive transfer (Nagao and Tsujii 1986) (Figure 5). 712 D ow naded rom httpdirect.m it.edu/coli/article-p7/1979478/coli_","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"707-727"},"PeriodicalIF":9.3,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44009399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Natural Language Processing: A Machine Learning Perspective by Yue Zhang and Zhiyang Teng 从机器学习的角度看自然语言处理

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-10-04 DOI: 10.1162/coli_r_00423

Julia Ive

引用次数: 0

Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis 情绪自动识别与情绪分析伦理表

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-09-17 DOI: 10.1162/coli_a_00433

Saif M. Mohammad

{"title":"Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis","authors":"Saif M. Mohammad","doi":"10.1162/coli_a_00433","DOIUrl":"https://doi.org/10.1162/coli_a_00433","url":null,"abstract":"Abstract The importance and pervasiveness of emotions in our lives makes affective computing a tremendously important and vibrant line of work. Systems for automatic emotion recognition (AER) and sentiment analysis can be facilitators of enormous progress (e.g., in improving public health and commerce) but also enablers of great harm (e.g., for suppressing dissidents and manipulating voters). Thus, it is imperative that the affective computing community actively engage with the ethical ramifications of their creations. In this article, I have synthesized and organized information from AI Ethics and Emotion Recognition literature to present fifty ethical considerations relevant to AER. Notably, this ethics sheet fleshes out assumptions hidden in how AER is commonly framed, and in the choices often made regarding the data, method, and evaluation. Special attention is paid to the implications of AER on privacy and social groups. Along the way, key recommendations are made for responsible AER. The objective of the ethics sheet is to facilitate and encourage more thoughtfulness on why to automate, how to automate, and how to judge success well before the building of AER systems. Additionally, the ethics sheet acts as a useful introductory document on emotion recognition (complementing survey articles).","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"239-278"},"PeriodicalIF":9.3,"publicationDate":"2021-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48330840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Survey of Low-Resource Machine Translation 低资源机器翻译研究综述

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-09-01 DOI: 10.1162/coli_a_00446

B. Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindvrich Helcl, Alexandra Birch

引用次数: 70

The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification 文本简化自动评价指标的(不)适用性

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-08-11 DOI: 10.1162/coli_a_00418

Fernando Alva-Manchego, Carolina Scarton, Lucia Specia

{"title":"The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification","authors":"Fernando Alva-Manchego, Carolina Scarton, Lucia Specia","doi":"10.1162/coli_a_00418","DOIUrl":"https://doi.org/10.1162/coli_a_00418","url":null,"abstract":"Abstract In order to simplify sentences, several rewriting operations can be performed, such as replacing complex words per simpler synonyms, deleting unnecessary information, and splitting long sentences. Despite this multi-operation nature, evaluation of automatic simplification systems relies on metrics that moderately correlate with human judgments on the simplicity achieved by executing specific operations (e.g., simplicity gain based on lexical replacements). In this article, we investigate how well existing metrics can assess sentence-level simplifications where multiple operations may have been applied and which, therefore, require more general simplicity judgments. For that, we first collect a new and more reliable data set for evaluating the correlation of metrics and human judgments of overall simplicity. Second, we conduct the first meta-evaluation of automatic metrics in Text Simplification, using our new data set (and other existing data) to analyze the variation of the correlation between metrics’ scores and human judgments across three dimensions: the perceived simplicity level, the system type, and the set of references used for computation. We show that these three aspects affect the correlations and, in particular, highlight the limitations of commonly used operation-specific metrics. Finally, based on our findings, we propose a set of recommendations for automatic evaluation of multi-operation simplifications, suggesting which metrics to compute and how to interpret their scores.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"861-889"},"PeriodicalIF":9.3,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45077149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

LFG Generation from Acyclic F-Structures is NP-Hard 从非循环F结构生成LFG是NP难的

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-08-11 DOI: 10.1162/coli_a_00419

Jürgen Wedekind, R. Kaplan

引用次数: 1

Are Ellipses Important for Machine Translation? 省略号对机器翻译很重要吗?

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2021-08-05 DOI: 10.1162/coli_a_00414

Payal Khullar

引用次数: 0