Computational Linguistics最新文献

筛选
英文 中文
My Tenure as the Editor-in-Chief of Computational Linguistics 我的计算语言学主编任期
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-01-10 DOI: 10.1162/coli_e_00505
Hwee Tou Ng
{"title":"My Tenure as the Editor-in-Chief of Computational Linguistics","authors":"Hwee Tou Ng","doi":"10.1162/coli_e_00505","DOIUrl":"https://doi.org/10.1162/coli_e_00505","url":null,"abstract":"Time flies and it has been close to five and a half years since I became the editor-in-chief of Computational Linguistics on 15 July 2018. In this editorial, I will describe the changes that I have introduced at the journal, and highlight the achievements and challenges of the journal.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"83 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topics in the Haystack: Enhancing Topic Quality through Corpus Expansion 干草堆中的话题:通过语料库扩展提高主题质量
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-01-08 DOI: 10.1162/coli_a_00506
Anton Thielmann, Arik Reuter, Quentin Seifert, Elisabeth Bergherr, Benjamin Säfken
{"title":"Topics in the Haystack: Enhancing Topic Quality through Corpus Expansion","authors":"Anton Thielmann, Arik Reuter, Quentin Seifert, Elisabeth Bergherr, Benjamin Säfken","doi":"10.1162/coli_a_00506","DOIUrl":"https://doi.org/10.1162/coli_a_00506","url":null,"abstract":"Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. Through simple corpus expansion, our model can detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared to state-of-the-art topic modeling and document clustering models.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"14 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139409651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Common Flaws in Running Human Evaluation Experiments in NLP 在 NLP 中进行人工评估实验的常见缺陷
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-01-08 DOI: 10.1162/coli_a_00508
Craig Thomson, Ehud Reiter, Anya Belz
{"title":"Common Flaws in Running Human Evaluation Experiments in NLP","authors":"Craig Thomson, Ehud Reiter, Anya Belz","doi":"10.1162/coli_a_00508","DOIUrl":"https://doi.org/10.1162/coli_a_00508","url":null,"abstract":"While conducting a coordinated set of repeat runs of human evaluation experiments in NLP, we discovered flaws in every single experiment we selected for inclusion via a systematic process. In this paper, we describe the types of flaws we discovered which include coding errors (e.g., loading the wrong system outputs to evaluate), failure to follow standard scientific practice (e.g., ad hoc exclusion of participants and responses), and mistakes in reported numerical results (e.g., reported numbers not matching experimental data). If these problems are widespread, it would have worrying implications for the rigour of NLP evaluation experiments as currently conducted. We discuss what researchers can do to reduce the occurrence of such flaws, including pre-registration, better code development practices, increased testing and piloting, and post-publication addressing of errors.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"45 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139415273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian approach to uncertainty in word embedding bias estimation 用贝叶斯方法估算词嵌入偏差的不确定性
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-01-08 DOI: 10.1162/coli_a_00507
Alicja Dobrzeniecka, Rafal Urbaniak
{"title":"A Bayesian approach to uncertainty in word embedding bias estimation","authors":"Alicja Dobrzeniecka, Rafal Urbaniak","doi":"10.1162/coli_a_00507","DOIUrl":"https://doi.org/10.1162/coli_a_00507","url":null,"abstract":"Multiple measures, such as WEAT or MAC, attempt to quantify the magnitude of bias present in word embeddings in terms of a single-number metric. However, such metrics and the related statistical significance calculations rely on treating pre-averaged data as individual data points and employing bootstrapping techniques with low sample sizes. We show that similar results can be easily obtained using such methods even if the data are generated by a null model lacking the intended bias. Consequently, we argue that this approach generates false confidence. To address this issue, we propose a Bayesian alternative: hierarchical Bayesian modeling, which enables a more uncertainty-sensitive inspection of bias in word embeddings at different levels of granularity. To showcase our method, we apply it to Religion, Gender, and Race word lists from the original research, together with our control neutral word lists. We deploy the method using Google, Glove, and Reddit embeddings. Further, we utilize our approach to evaluate a debiasing technique applied to the Reddit word embedding. Our findings reveal a more complex landscape than suggested by the proponents of single-number metrics. The datasets and source code for the paper are publicly available.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"14 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139409608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Cross-linguistic Utility of Abstract Meaning Representation 评估抽象意义表征的跨语言效用
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2023-12-19 DOI: 10.1162/coli_a_00503
Shira Wein, Nathan Schneider
{"title":"Assessing the Cross-linguistic Utility of Abstract Meaning Representation","authors":"Shira Wein, Nathan Schneider","doi":"10.1162/coli_a_00503","DOIUrl":"https://doi.org/10.1162/coli_a_00503","url":null,"abstract":"Semantic representations capture the meaning of a text. Meaning Representation (AMR), a type of semantic representation, focuses on predicate-argument structure and abstracts away from surface form. Though AMR was developed initially for English, it has now been adapted to a multitude of languages in the form of non-English annotation schemas, cross-lingual text-to-AMR parsing, and AMR-to-(non-English) text generation. We advance prior work on cross-lingual AMR by thoroughly investigating the amount, types, and causes of differences which appear in AMRs of different languages. Further, we compare how AMR captures meaning in cross-lingual pairs versus strings, and show that AMR graphs are able to draw out fine-grained differences between parallel sentences. We explore three primary research questions: (1) What are the types and causes of differences in parallel AMRs? (2) How can we measure the amount of difference between AMR pairs in different languages? (3) Given that AMR structure is affected by language and exhibits cross-lingual differences, how do cross-lingual AMR pairs compare to string-based representations of cross-lingual sentence pairs? We find that the source language itself does have a measurable impact on AMR structure, and that translation divergences and annotator choices also lead to differences in cross-lingual AMR pairs. We explore the implications of this finding throughout our study, concluding that, while AMR is useful to capture meaning across languages, evaluations need to take into account source language influences if they are to paint an accurate picture of system output, and meaning generally.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"3 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138819238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UG-schematic Annotation for Event Nominals: A Case Study in Mandarin Chinese 事件命名词的 UG-schematic 注释:普通话案例研究
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2023-12-19 DOI: 10.1162/coli_a_00504
Wenxi Li, Guy Emerson, Yutong Zhang, Weiwei Sun
{"title":"UG-schematic Annotation for Event Nominals: A Case Study in Mandarin Chinese","authors":"Wenxi Li, Guy Emerson, Yutong Zhang, Weiwei Sun","doi":"10.1162/coli_a_00504","DOIUrl":"https://doi.org/10.1162/coli_a_00504","url":null,"abstract":"Divergence of languages observed at the surface level is a major challenge encountered by multilingual data representation, especially when typologically distant languages are involved. Drawing inspirations from a formalist Chomskyan perspective towards language universals, Universal Grammar (UG), this article employs deductively pre-defined universals to analyse a multilingually heterogeneous phenomenon, event nominals. In this way, deeper universality of event nominals beneath their huge divergence in different languages is uncovered, which empowers us to break barriers between languages and thus extend insights from some synthetic languages to a non-inflectional language, Mandarin Chinese. Our empirical investigation also demonstrates this UG-inspired schema is effective: with its assistance, the inter-annotator agreement (IAA) for identifying event nominals in Mandarin grows from 88.02% to 94.99%, and automatic detection of event-reading nominalizations on the newly-established data achieves an accuracy of 94.76% and an F1 score of 91.3%, which are significantly surpass those achieved on the pre-existing resource by 9.8% and 5.2% respectively. Our systematic analysis also sheds light on nominal semantic role labelling (SRL). By providing a clear definition and classification on arguments of event nominal, the IAA of this task significantly increases from 90.46% to 98.04%.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"3 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138819390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Large Language Models Transform Computational Social Science? 大型语言模型能否改变计算社会科学?
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2023-12-12 DOI: 10.1162/coli_a_00502
Caleb Ziems, Omar Shaikh, Zhehao Zhang, William Held, Jiaao Chen, Diyi Yang
{"title":"Can Large Language Models Transform Computational Social Science?","authors":"Caleb Ziems, Omar Shaikh, Zhehao Zhang, William Held, Jiaao Chen, Diyi Yang","doi":"10.1162/coli_a_00502","DOIUrl":"https://doi.org/10.1162/coli_a_00502","url":null,"abstract":"Large Language Models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the Computational Social Science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references.We conclude that the performance of today’s LLMs can augment the CSS research pipeline in two ways: (1) serving as zeroshot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in social science analysis in partnership with humans.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"103 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stance Detection with Explanations 姿态检测与解释
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2023-12-12 DOI: 10.1162/coli_a_00501
Rudra Ranajee Saha, Raymond T. Ng, Laks V. S. Lakshmanan
{"title":"Stance Detection with Explanations","authors":"Rudra Ranajee Saha, Raymond T. Ng, Laks V. S. Lakshmanan","doi":"10.1162/coli_a_00501","DOIUrl":"https://doi.org/10.1162/coli_a_00501","url":null,"abstract":"Identification of stance has recently gained a lot of attention with the extreme growth of fake news and filter bubbles. Over the last decade, many feature-based and deep-learning approaches have been proposed to solve Stance Detection. However, almost none of the existing works focus on providing a meaningful explanation for their prediction. In this work, we study Stance Detection with an emphasis on generating explanations for the predicted stance by capturing the pivotal argumentative structure embedded in a document. We propose to build a Stance Tree which utilizes Rhetorical Parsing to construct an evidence tree and to use Dempster Shafer Theory to aggregate the evidence. Human studies show that our unsupervised technique of generating stance explanations outperforms the SOTA extractive summarization method in terms of informativeness, non-redundancy, coverage, and overall quality. Furthermore, experiments show that our explanation-based stance prediction excels or matches the performance of the SOTA model on various benchmark datasets.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"18 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polysemy - Evidence from Linguistics, Behavioural Science and Contextualised Language Models 多义词--来自语言学、行为科学和语境化语言模型的证据
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2023-12-12 DOI: 10.1162/coli_a_00500
Janosch Haber, Massimo Poesio
{"title":"Polysemy - Evidence from Linguistics, Behavioural Science and Contextualised Language Models","authors":"Janosch Haber, Massimo Poesio","doi":"10.1162/coli_a_00500","DOIUrl":"https://doi.org/10.1162/coli_a_00500","url":null,"abstract":"Polysemy is the type of lexical ambiguity where a word has multiple distinct but related interpretations. In the past decade, it has been the subject of a great many studies across multiple disciplines including linguistics, psychology, neuroscience, and computational linguistics, which have made it increasingly clear that the complexity of polysemy precludes simple, universal answers, especially concerning the representation and processing of polysemous words. But fuelled by the growing availability of large, crowdsourced datasets providing substantial empirical evidence; improved behavioral methodology; and the development of contextualised language models capable of encoding the fine-grained meaning of a word within a given context, the literature on polysemy recently has developed more complex theoretical analyses. In this survey we discuss these recent contributions to the investigation of polysemy against the backdrop of a long legacy of research across multiple decades and disciplines. Our aim is to bring together different perspectives to achieve a more complete picture of the heterogeneity and complexity of the phenomenon of polysemy. Specifically, we highlight evidence supporting a range of hybrid models of the mental processing of polysemes. These hybrid models combine elements from different previous theoretical approaches to explain patterns and idiosyncrasies in the processing of polysemous that the best known models so far have failed to account for. Our literature review finds that i) traditional analyses of polysemy can be limited in their generalisability by loose definitions and selective materials; ii) linguistic tests provide useful evidence on individual cases, but fail to capture the full range of factors involved in the processing of polysemous sense extensions; and iii) recent behavioural (psycho) linguistics studies, largescale annotation efforts and investigations leveraging contextualised language models provide accumulating evidence suggesting that polysemous sense similarity covers a wide spectrum between identity of sense and homonymy-like unrelatedness of meaning. We hope that the interdisciplinary account of polysemy provided in this survey inspires further fundamental research on the nature of polysemy and better equips applied research to deal with the complexity surrounding the phenomenon, e.g. by enabling the development of benchmarks and testing paradigms for large language models informed by a greater portion of the rich evidence on the phenomenon currently available.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"8 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
My Big, Fat 50-Year Journey 我的大胖 50 年历程
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2023-12-06 DOI: 10.1162/coli_a_00499
Martha Palmer
{"title":"My Big, Fat 50-Year Journey","authors":"Martha Palmer","doi":"10.1162/coli_a_00499","DOIUrl":"https://doi.org/10.1162/coli_a_00499","url":null,"abstract":"","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"16 7","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138597249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信