Computational Linguistics最新文献_第2页

Do Multimodal Large Language Models and Humans Ground Language Similarly? 多模态大型语言模型与人类的语言基础相似吗？

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-07-30 DOI: 10.1162/coli_a_00531

Cameron Jones, Benjamin Bergen, Sean Trott

{"title":"Do Multimodal Large Language Models and Humans Ground Language Similarly?","authors":"Cameron Jones, Benjamin Bergen, Sean Trott","doi":"10.1162/coli_a_00531","DOIUrl":"https://doi.org/10.1162/coli_a_00531","url":null,"abstract":"Large Language Models (LLMs) have been criticized for failing to connect linguistic meaning to the world—for failing to solve the “symbol grounding problem.” Multimodal Large Language Models (MLLMs) offer a potential solution to this challenge by combining linguistic representations and processing with other modalities. However, much is still unknown about exactly how and to what degree MLLMs integrate their distinct modalities—and whether the way they do so mirrors the mechanisms believed to underpin grounding in humans. In humans, it has been hypothesized that linguistic meaning is grounded through “embodied simulation,” the activation of sensorimotor and affective representations reflecting described experiences. Across four pre-registered studies, we adapt experimental techniques originally developed to investigate embodied simulation in human comprehenders to ask whether MLLMs are sensitive to sensorimotor features that are implied but not explicit in descriptions of an event. In Experiment 1, we find sensitivity to some features (color and shape) but not others (size, orientation, and volume). In Experiment 2, we identify likely bottlenecks to explain an MLLM’s lack of sensitivity. In Experiment 3, we find that despite sensitivity to implicit sensorimotor features, MLLMs cannot fully account for human behavior on the same task. Finally, in Experiment 4, we compare the psychometric predictive power of different MLLM architectures and find that ViLT, a single-stream architecture, is more predictive of human responses to one sensorimotor feature (shape) than CLIP, a dual-encoder architecture—despite being trained on orders of magnitude less data. These results reveal strengths and limitations in the ability of current MLLMs to integrate language with other modalities, and also shed light on the likely mechanisms underlying human language comprehension.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"28 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation 跨语言跨时态摘要：数据集、模型、评估

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-05-16 DOI: 10.1162/coli_a_00519

Ran Zhang, Jihed Ouni, Steffen Eger

{"title":"Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation","authors":"Ran Zhang, Jihed Ouni, Steffen Eger","doi":"10.1162/coli_a_00519","DOIUrl":"https://doi.org/10.1162/coli_a_00519","url":null,"abstract":"While summarization has been extensively researched in natural language processing (NLP), cross-lingual cross-temporal summarization (CLCTS) is a largely unexplored area that has the potential to improve cross-cultural accessibility and understanding. This paper comprehensively addresses the CLCTS task, including dataset creation, modeling, and evaluation. We (1) build the first CLCTS corpus with 328 (+127) instances for hDe-En and 289 (+212) for hEn-De, leveraging historical fiction texts and Wikipedia summaries in English and German; (2) examine the effectiveness of popular transformer end-to-end models with different intermediate finetuning tasks; (3) explore the potential of GPT-3.5 as a summarizer; (4) report evaluations from humans, GPT-4, and several recent automatic evaluation metrics. Our results indicate that intermediate task finetuned end-to-end models generate bad to moderate quality summaries while GPT-3.5, as a zero-shot summarizer, provides moderate to good quality outputs. GPT-3.5 also seems very adept at normalizing historical text. To assess data contamination in GPT-3.5, we design an adversarial attack scheme in which we find that GPT-3.5 performs slightly worse for unseen source documents compared to seen documents. Moreover, it sometimes hallucinates when the source sentences are inverted against its prior knowledge with a summarization accuracy of 0.67 for plot omission, 0.71 for entity swap, and 0.53 for plot negation. Overall, our regression results of model performances suggest that longer, older, and more complex source texts (all of which are more characteristic for historical language variants) are harder to summarize for all models, indicating the difficulty of the CLCTS task. Regarding evaluation, we observe that both GPT-4 and BERTScore correlate moderately with human evaluations but GPT-4 is prone to giving lower scores.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"44 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141059917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Aligning Human and Computational Coherence Evaluations 调整人类和计算的一致性评估

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-05-02 DOI: 10.1162/coli_a_00518

Jia Peng Lim, Hady W. Lauw

{"title":"Aligning Human and Computational Coherence Evaluations","authors":"Jia Peng Lim, Hady W. Lauw","doi":"10.1162/coli_a_00518","DOIUrl":"https://doi.org/10.1162/coli_a_00518","url":null,"abstract":"Automated coherence metrics constitute an efficient and popular way to evaluate topic models. Previous works present a mixed picture of their presumed correlation with human judgment. This work proposes a novel sampling approach to mine topic representations at a large-scale while seeking to mitigate bias from sampling, enabling the investigation of widely-used automated coherence metrics via large corpora. Additionally, this article proposes a novel user study design, an amalgamation of different proxy tasks, to derive a finer insight into the human decision-making processes. This design subsumes the purpose of simple rating and outlier-detection user studies. Similar to the sampling approach, the user study conducted is very extensive, comprising forty study participants split into eight different study groups tasked with evaluating their respective set of one hundred topic representations. Usually, when substantiating the use of these metrics, human responses are treated as the golden standard. This article further investigates the reliability of human judgment by flipping the comparison and conducting a novel extended analysis of human response at the group and individual level against a generic corpus. The investigation results show a moderate to good correlation between these metrics and human judgment, especially for generic corpora, and derive further insights into the human perception of coherence. Analysing inter-metric correlations across corpora shows moderate to good correlation amongst these metrics. As these metrics depend on corpus statistics, this article further investigates the topical differences between corpora revealing nuances in applications of these metrics.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"64 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140838018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analyzing Dataset Annotation Quality Management in the Wild 分析野生数据集注释质量管理

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-03-25 DOI: 10.1162/coli_a_00516

Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych

{"title":"Analyzing Dataset Annotation Quality Management in the Wild","authors":"Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych","doi":"10.1162/coli_a_00516","DOIUrl":"https://doi.org/10.1162/coli_a_00516","url":null,"abstract":"Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models as well as for their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exist, to our knowledge, large-scale analysis has yet to be performed on how quality management is conducted when creating natural language datasets and whether these recommendations are followed. Therefore, we first survey and summarize recommended quality management practices for dataset creation as described in the literature and provide suggestions for applying them. Then, we compile a corpus of 591 scientific publications introducing text datasets and annotate it for quality-related aspects, such as annotator management, agreement, adjudication, or data validation. Using these annotations, we then analyze how quality management is conducted in practice. A majority of the annotated publications apply good or excellent quality management. However, we deem the effort of 30% of the works as only subpar. Our analysis also shows common errors, especially when using inter-annotator agreement and computing annotation error rates.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"17 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LLM–Assisted Data Augmentation for Chinese Dialogue–Level Dependency Parsing 用于中文对话级依赖关系解析的 LLM 辅助数据扩展

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-03-12 DOI: 10.1162/coli_a_00515

Meishan Zhang, Gongyao Jiang, Shuang Liu, Jing Chen, Min Zhang

{"title":"LLM–Assisted Data Augmentation for Chinese Dialogue–Level Dependency Parsing","authors":"Meishan Zhang, Gongyao Jiang, Shuang Liu, Jing Chen, Min Zhang","doi":"10.1162/coli_a_00515","DOIUrl":"https://doi.org/10.1162/coli_a_00515","url":null,"abstract":"Dialogue–level dependency parsing, despite its growing academic interest, often encounters underperformance issues due to resource shortages. A potential solution to this challenge is data augmentation. In recent years, large language models (LLMs) have demonstrated strong capabilities in generation which can facilitate data augmentation greatly. In this study, we focus on Chinese dialogue–level dependency parsing, presenting three simple and effective strategies with LLM to augment the original training instances, namely word–level, syntax–level and discourse–level augmentations, respectively. These strategies enable LLMs to either preserve or modify dependency structures, thereby assuring accuracy while increasing the diversity of instances at different levels. We conduct experiments on the benchmark dataset released by Jiang et al. (2023) to validate our approach. Results show that our method can greatly boost the parsing performance in various settings, particularly in dependencies among elementary discourse units (EDUs). Lastly, we provide in–depth analysis to show the key points of our data augmentation strategies.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"72 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts 系统回顾破译青铜时代爱琴海和塞浦路斯文字的计算方法

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-03-08 DOI: 10.1162/coli_a_00514

Maja Braović, Damir Krstinić, Maja Štula, Antonia Ivanda

引用次数: 0

A Novel Alignment-based Approach for PARSEVAL Measures 基于对齐的 PARSEVAL 测量新方法

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-03-04 DOI: 10.1162/coli_a_00512

Eunkyul Leah Jo, Angela Yoonseo Park, Jungyeul Park

引用次数: 0

Towards Faithful Model Explanation in NLP: A Survey 在 NLP 中实现忠实模型解释：一项调查

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-01-22 DOI: 10.1162/coli_a_00511

Qing Lyu, Marianna Apidianaki, Chris Callison-Burch

引用次数: 0

Context-aware Transliteration of Romanized South Asian Languages 罗马化南亚语言的语境感知译法

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-01-19 DOI: 10.1162/coli_a_00510

Christo Kirov, Cibu Johny, Anna Katanova, Alexander Gutkin, Brian Roark

{"title":"Context-aware Transliteration of Romanized South Asian Languages","authors":"Christo Kirov, Cibu Johny, Anna Katanova, Alexander Gutkin, Brian Roark","doi":"10.1162/coli_a_00510","DOIUrl":"https://doi.org/10.1162/coli_a_00510","url":null,"abstract":"While most transliteration research is focused on single tokens such as named entities – e.g., transliteration of “અમદાવાદ” from the Gujarati script to the Latin script “Ahmedabad” – the informal romanization prevalent in South Asia and elsewhere often requires transliteration of full sentences. The lack of large parallel text collections of full sentence (as opposed to single word) transliterations necessitates incorporation of contextual information into transliteration via non-parallel resources, such as via mono-script text collections. In this paper, we present a number of methods for improving transliteration in context for such a use scenario. Some of these methods in fact improve performance without making use of sentential context, allowing for better quantification of the degree to which contextual information in particular is responsible for system improvements. Our final systems, which ultimately rely upon ensembles including large pretrained language models finetuned on simulated parallel data, yield substantial improvements over the best previously reported results for full sentence transliteration from Latin to native script on all 12 languages in the Dakshina dataset (Roark et al. 2020), with an overall 3.3% absolute (18.6% relative) mean word-error rate reduction.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"31 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Pitfalls of Defining Hallucination 定义幻觉的陷阱

IF 9.3 2区计算机科学

Computational Linguistics Pub Date : 2024-01-19 DOI: 10.1162/coli_a_00509

Kees van Deemter

引用次数: 0