Computational Linguistics最新文献

筛选
英文 中文
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future 注释错误检测:分析过去和现在以获得更连贯的未来
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-06-05 DOI: 10.1162/coli_a_00464
Jan-Christoph Klie, B. Webber, Iryna Gurevych
{"title":"Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future","authors":"Jan-Christoph Klie, B. Webber, Iryna Gurevych","doi":"10.1162/coli_a_00464","DOIUrl":"https://doi.org/10.1162/coli_a_00464","url":null,"abstract":"Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising number of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have been devised over the years. While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets. This raises strong concerns on methods’ general performance and makes it difficult to assess their strengths and weaknesses. We therefore reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets for text classification as well as token and span labeling. In addition, we define a uniform evaluation setup including a new formalization of the annotation error detection task, evaluation protocol, and general best practices. To facilitate future research and reproducibility, we release our datasets and implementations in an easy-to-use and open source software package.1","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"49 1","pages":"157-198"},"PeriodicalIF":9.3,"publicationDate":"2022-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42479264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Noun2Verb: Probabilistic Frame Semantics for Word Class Conversion 名词动词:类转换的概率框架语义
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-05-12 DOI: 10.1162/coli_a_00447
Lei Yu, Yang Xu
{"title":"Noun2Verb: Probabilistic Frame Semantics for Word Class Conversion","authors":"Lei Yu, Yang Xu","doi":"10.1162/coli_a_00447","DOIUrl":"https://doi.org/10.1162/coli_a_00447","url":null,"abstract":"Abstract Humans can flexibly extend word usages across different grammatical classes, a phenomenon known as word class conversion. Noun-to-verb conversion, or denominal verb (e.g., to Google a cheap flight), is one of the most prevalent forms of word class conversion. However, existing natural language processing systems are impoverished in interpreting and generating novel denominal verb usages. Previous work has suggested that novel denominal verb usages are comprehensible if the listener can compute the intended meaning based on shared knowledge with the speaker. Here we explore a computational formalism for this proposal couched in frame semantics. We present a formal framework, Noun2Verb, that simulates the production and comprehension of novel denominal verb usages by modeling shared knowledge of speaker and listener in semantic frames. We evaluate an incremental set of probabilistic models that learn to interpret and generate novel denominal verb usages via paraphrasing. We show that a model where the speaker and listener cooperatively learn the joint distribution over semantic frame elements better explains the empirical denominal verb usages than state-of-the-art language models, evaluated against data from (1) contemporary English in both adult and child speech, (2) contemporary Mandarin Chinese, and (3) the historical development of English. Our work grounds word class conversion in probabilistic frame semantics and bridges the gap between natural language processing systems and humans in lexical creativity.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"783-818"},"PeriodicalIF":9.3,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47838054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers 需要两个弗林特才能生火:神经关系和解释分类器的多任务学习
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-04-25 DOI: 10.1162/coli_a_00463
Zheng Tang, M. Surdeanu
{"title":"It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers","authors":"Zheng Tang, M. Surdeanu","doi":"10.1162/coli_a_00463","DOIUrl":"https://doi.org/10.1162/coli_a_00463","url":null,"abstract":"We propose an explainable approach for relation extraction that mitigates the tension between generalization and explainability by jointly training for the two goals. Our approach uses a multi-task learning architecture, which jointly trains a classifier for relation extraction, and a sequence model that labels words in the context of the relations that explain the decisions of the relation classifier. We also convert the model outputs to rules to bring global explanations to this approach. This sequence model is trained using a hybrid strategy: supervised, when supervision from pre-existing patterns is available, and semi-supervised otherwise. In the latter situation, we treat the sequence model’s labels as latent variables, and learn the best assignment that maximizes the performance of the relation classifier. We evaluate the proposed approach on the two datasets and show that the sequence model provides labels that serve as accurate explanations for the relation classifier’s decisions, and, importantly, that the joint training generally improves the performance of the relation classifier. We also evaluate the performance of the generated rules and show that the new rules are a great add-on to the manual rules and bring the rule-based system much closer to the neural models.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"49 1","pages":"117-156"},"PeriodicalIF":9.3,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45265529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review 修订和重新提交:同行评审中基于文本的协作的文本间模型
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-04-22 DOI: 10.1162/coli_a_00455
Ilia Kuznetsov, Jan Buchmann, Max Eichler, Iryna Gurevych
{"title":"Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review","authors":"Ilia Kuznetsov, Jan Buchmann, Max Eichler, Iryna Gurevych","doi":"10.1162/coli_a_00455","DOIUrl":"https://doi.org/10.1162/coli_a_00455","url":null,"abstract":"Abstract Peer review is a key component of the publishing process in most fields of science. Increasing submission rates put a strain on reviewing quality and efficiency, motivating the development of applications to support the reviewing and editorial work. While existing NLP studies focus on the analysis of individual texts, editorial assistance often requires modeling interactions between pairs of texts—yet general frameworks and datasets to support this scenario are missing. Relationships between texts are the core object of the intertextuality theory—a family of approaches in literary studies not yet operationalized in NLP. Inspired by prior theoretical work, we propose the first intertextual model of text-based collaboration, which encompasses three major phenomena that make up a full iteration of the review–revise–and–resubmit cycle: pragmatic tagging, linking, and long-document version alignment. While peer review is used across the fields of science and publication formats, existing datasets solely focus on conference-style review in computer science. Addressing this, we instantiate our proposed model in the first annotated multidomain corpus in journal-style post-publication open peer review, and provide detailed insights into the practical aspects of intertextual annotation. Our resource is a major step toward multidomain, fine-grained applications of NLP in editorial support for peer review, and our intertextual framework paves the path for general-purpose modeling of text-based collaboration. We make our corpus, detailed annotation guidelines, and accompanying code publicly available.1","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"949-986"},"PeriodicalIF":9.3,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49225007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology 语言类型学视角下的多语言句子编码器语言关系研究
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-04-13 DOI: 10.1162/coli_a_00444
Rochelle Choenni, Ekaterina Shutova
{"title":"Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology","authors":"Rochelle Choenni, Ekaterina Shutova","doi":"10.1162/coli_a_00444","DOIUrl":"https://doi.org/10.1162/coli_a_00444","url":null,"abstract":"Abstract Multilingual sentence encoders have seen much success in cross-lingual model transfer for downstream NLP tasks. The success of this transfer is, however, dependent on the model’s ability to encode the patterns of cross-lingual similarity and variation. Yet, we know relatively little about the properties of individual languages or the general patterns of linguistic variation that the models encode. In this article, we investigate these questions by leveraging knowledge from the field of linguistic typology, which studies and documents structural and semantic variation across languages. We propose methods for separating language-specific subspaces within state-of-the-art multilingual sentence encoders (LASER, M-BERT, XLM, and XLM-R) with respect to a range of typological properties pertaining to lexical, morphological, and syntactic structure. Moreover, we investigate how typological information about languages is distributed across all layers of the models. Our results show interesting differences in encoding linguistic variation associated with different pretraining strategies. In addition, we propose a simple method to study how shared typological properties of languages are encoded in two state-of-the-art multilingual models—M-BERT and XLM-R. The results provide insight into their information-sharing mechanisms and suggest that these linguistic properties are encoded jointly across typologically similar languages in these models.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"635-672"},"PeriodicalIF":9.3,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44534157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Tractable Parsing for CCGs of Bounded Degree 有界度CCG的可牵引解析
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-04-07 DOI: 10.1162/coli_a_00441
Lena Katharina Schiffer, Marco Kuhlmann, G. Satta
{"title":"Tractable Parsing for CCGs of Bounded Degree","authors":"Lena Katharina Schiffer, Marco Kuhlmann, G. Satta","doi":"10.1162/coli_a_00441","DOIUrl":"https://doi.org/10.1162/coli_a_00441","url":null,"abstract":"Abstract Unlike other mildly context-sensitive formalisms, Combinatory Categorial Grammar (CCG) cannot be parsed in polynomial time when the size of the grammar is taken into account. Refining this result, we show that the parsing complexity of CCG is exponential only in the maximum degree of composition. When that degree is fixed, parsing can be carried out in polynomial time. Our finding is interesting from a linguistic perspective because a bounded degree of composition has been suggested as a universal constraint on natural language grammar. Moreover, ours is the first complexity result for a version of CCG that includes substitution rules, which are used in practical grammars but have been ignored in theoretical work.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"593-633"},"PeriodicalIF":9.3,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44178341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Impact of Edge Displacement Vaserstein Distance on UD Parsing Performance 边缘位移、维瑟斯坦距离对UD解析性能的影响
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-04-07 DOI: 10.1162/coli_a_00440
Mark Anderson, Carlos Gómez-Rodríguez
{"title":"The Impact of Edge Displacement Vaserstein Distance on UD Parsing Performance","authors":"Mark Anderson, Carlos Gómez-Rodríguez","doi":"10.1162/coli_a_00440","DOIUrl":"https://doi.org/10.1162/coli_a_00440","url":null,"abstract":"Abstract We contribute to the discussion on parsing performance in NLP by introducing a measurement that evaluates the differences between the distributions of edge displacement (the directed distance of edges) seen in training and test data. We hypothesize that this measurement will be related to differences observed in parsing performance across treebanks. We motivate this by building upon previous work and then attempt to falsify this hypothesis by using a number of statistical methods. We establish that there is a statistical correlation between this measurement and parsing performance even when controlling for potential covariants. We then use this to establish a sampling technique that gives us an adversarial and complementary split. This gives an idea of the lower and upper bounds of parsing systems for a given treebank in lieu of freshly sampled data. In a broader sense, the methodology presented here can act as a reference for future correlation-based exploratory work in NLP.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"517-554"},"PeriodicalIF":9.3,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44597713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UDapter: Typology-based Language Adapters for Multilingual Dependency Parsing and Sequence Labeling 用于多语言依赖解析和序列标记的基于类型学的语言适配器
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-04-07 DOI: 10.1162/coli_a_00443
A. Üstün, Arianna Bisazza, G. Bouma, G. van Noord
{"title":"UDapter: Typology-based Language Adapters for Multilingual Dependency Parsing and Sequence Labeling","authors":"A. Üstün, Arianna Bisazza, G. Bouma, G. van Noord","doi":"10.1162/coli_a_00443","DOIUrl":"https://doi.org/10.1162/coli_a_00443","url":null,"abstract":"Abstract Recent advances in multilingual language modeling have brought the idea of a truly universal parser closer to reality. However, such models are still not immune to the “curse of multilinguality”: Cross-language interference and restrained model capacity remain major obstacles. To address this, we propose a novel language adaptation approach by introducing contextual language adapters to a multilingual parser. Contextual language adapters make it possible to learn adapters via language embeddings while sharing model parameters across languages based on contextual parameter generation. Moreover, our method allows for an easy but effective integration of existing linguistic typology features into the parsing model. Because not all typological features are available for every language, we further combine typological feature prediction with parsing in a multi-task model that achieves very competitive parsing performance without the need for an external prediction system for missing features. The resulting parser, UDapter, can be used for dependency parsing as well as sequence labeling tasks such as POS tagging, morphological tagging, and NER. In dependency parsing, it outperforms strong monolingual and multilingual baselines on the majority of both high-resource and low-resource (zero-shot) languages, showing the success of the proposed adaptation approach. In sequence labeling tasks, our parser surpasses the baseline on high resource languages, and performs very competitively in a zero-shot setting. Our in-depth analyses show that adapter generation via typological features of languages is key to this success.1","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"555-592"},"PeriodicalIF":9.3,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46724233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Erratum for “Formal Basis of a Language Universal” “通用语言的形式基础”勘误
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-04-01 DOI: 10.1162/coli_x_00432
Miloš Stanojević, Mark Steedman
{"title":"Erratum for “Formal Basis of a Language Universal”","authors":"Miloš Stanojević, Mark Steedman","doi":"10.1162/coli_x_00432","DOIUrl":"https://doi.org/10.1162/coli_x_00432","url":null,"abstract":"In the paper “Formal Basis of a Language Universal” by Miloš Stanojević and Mark Steedman in Computational Linguistics 47:1 (https://doi.org/10.1162/coli a 00394), there is an error in example (12) on page 17. The two occurrences of the notation W should appear as |W. The paper has been updated so that the paragraph reads: In the full theory, these rules are generalized to “second level” cases, in which the secondary function is of the form (Y|Z)|W such as the following “forward crossing” instance, in which — matches either / or in both input and output:","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"237-237"},"PeriodicalIF":9.3,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43486180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Onception: Active Learning with Expert Advice for Real World Machine Translation Onception:现实世界机器翻译的专家建议下的主动学习
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2022-03-09 DOI: 10.1162/coli_a_00473
Vania Mendoncca, Ricardo Rei, Luísa Coheur, Alberto Sardinha INESC-ID Lisboa, Instituto Superior T'ecnico, AI Unbabel
{"title":"Onception: Active Learning with Expert Advice for Real World Machine Translation","authors":"Vania Mendoncca, Ricardo Rei, Luísa Coheur, Alberto Sardinha INESC-ID Lisboa, Instituto Superior T'ecnico, AI Unbabel","doi":"10.1162/coli_a_00473","DOIUrl":"https://doi.org/10.1162/coli_a_00473","url":null,"abstract":"Active learning can play an important role in low-resource settings (i.e., where annotated data is scarce), by selecting which instances may be more worthy to annotate. Most active learning approaches for Machine Translation assume the existence of a pool of sentences in a source language, and rely on human annotators to provide translations or post-edits, which can still be costly. In this article, we apply active learning to a real-world human-in-the-loop scenario in which we assume that: (1) the source sentences may not be readily available, but instead arrive in a stream; (2) the automatic translations receive feedback in the form of a rating, instead of a correct/edited translation, since the human-in-the-loop might be a user looking for a translation, but not be able to provide one. To tackle the challenge of deciding whether each incoming pair source–translations is worthy to query for human feedback, we resort to a number of stream-based active learning query strategies. Moreover, because we do not know in advance which query strategy will be the most adequate for a certain language pair and set of Machine Translation models, we propose to dynamically combine multiple strategies using prediction with expert advice. Our experiments on different language pairs and feedback settings show that using active learning allows us to converge on the best Machine Translation systems with fewer human interactions. Furthermore, combining multiple strategies using prediction with expert advice outperforms several individual active learning strategies with even fewer interactions, particularly in partial feedback settings.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"49 1","pages":"325-372"},"PeriodicalIF":9.3,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49326100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信