Computational Linguistics最新文献

筛选
英文 中文
Do Multimodal Large Language Models and Humans Ground Language Similarly? 多模态大型语言模型与人类的语言基础相似吗?
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-07-30 DOI: 10.1162/coli_a_00531
Cameron Jones, Benjamin Bergen, Sean Trott
{"title":"Do Multimodal Large Language Models and Humans Ground Language Similarly?","authors":"Cameron Jones, Benjamin Bergen, Sean Trott","doi":"10.1162/coli_a_00531","DOIUrl":"https://doi.org/10.1162/coli_a_00531","url":null,"abstract":"Large Language Models (LLMs) have been criticized for failing to connect linguistic meaning to the world—for failing to solve the “symbol grounding problem.” Multimodal Large Language Models (MLLMs) offer a potential solution to this challenge by combining linguistic representations and processing with other modalities. However, much is still unknown about exactly how and to what degree MLLMs integrate their distinct modalities—and whether the way they do so mirrors the mechanisms believed to underpin grounding in humans. In humans, it has been hypothesized that linguistic meaning is grounded through “embodied simulation,” the activation of sensorimotor and affective representations reflecting described experiences. Across four pre-registered studies, we adapt experimental techniques originally developed to investigate embodied simulation in human comprehenders to ask whether MLLMs are sensitive to sensorimotor features that are implied but not explicit in descriptions of an event. In Experiment 1, we find sensitivity to some features (color and shape) but not others (size, orientation, and volume). In Experiment 2, we identify likely bottlenecks to explain an MLLM’s lack of sensitivity. In Experiment 3, we find that despite sensitivity to implicit sensorimotor features, MLLMs cannot fully account for human behavior on the same task. Finally, in Experiment 4, we compare the psychometric predictive power of different MLLM architectures and find that ViLT, a single-stream architecture, is more predictive of human responses to one sensorimotor feature (shape) than CLIP, a dual-encoder architecture—despite being trained on orders of magnitude less data. These results reveal strengths and limitations in the ability of current MLLMs to integrate language with other modalities, and also shed light on the likely mechanisms underlying human language comprehension.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Language Identification in Texts by Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, and Krister Lindén 文本中的自动语言识别,作者:Tommi Jauhiainen、Marcos Zampieri、Timothy Baldwin 和 Krister Lindén
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-06-10 DOI: 10.1162/coli_r_00521
Tom Lippincott
{"title":"Automatic Language Identification in Texts by Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, and Krister Lindén","authors":"Tom Lippincott","doi":"10.1162/coli_r_00521","DOIUrl":"https://doi.org/10.1162/coli_r_00521","url":null,"abstract":"","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141364612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach 未充分开发的生物医学领域中的关系提取:多样性优化采样和合成数据生成方法
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-05-21 DOI: 10.1162/coli_a_00520
Maxime Delmas, Magdalena Wysocka, Andr'e Freitas
{"title":"Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach","authors":"Maxime Delmas, Magdalena Wysocka, Andr'e Freitas","doi":"10.1162/coli_a_00520","DOIUrl":"https://doi.org/10.1162/coli_a_00520","url":null,"abstract":"\u0000 The sparsity of labelled data is an obstacle to the development of Relation Extraction (RE) models and the completion of databases in various biomedical areas. While being of high interest in drug-discovery, the literature on natural products, reporting the identification of potential bioactive compounds from organisms, is a concrete example of such an overlooked topic. To mark the start of this new task, we created the first curated evaluation dataset and extracted literature items from the LOTUS database to build training sets. To this end, we developed a new sampler, inspired by diversity metrics in ecology, named Greedy Maximum Entropy sampler (https://github.com/idiap/gme-sampler). The strategic optimization of both balance and diversity of the selected items in the evaluation set is important given the resource-intensive nature of manual curation. After quantifying the noise in the training set, in the form of discrepancies between the text of input abstracts and the expected output labels, we explored different strategies accordingly. Framing the task as an end-to-end Relation Extraction, we evaluated the performance of standard fine-tuning (BioGPT, GPT-2 and Seq2rel) and few-shot learning with open Large Language Models (LLaMA 7B-65B). In addition to their evaluation in few-shot settings, we explore the potential of open LLMs as synthetic data generators and propose a new workflow for this purpose. All evaluated models exhibited substantial improvements when fine-tuned on synthetic abstracts rather than the original noisy data. We provide our best performing (f1-score = 59.0) BioGPT-Large model for end-to-end RE of natural products relationships along with all the training and evaluation datasets. See more details at https://github.com/idiap/abroad-re.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141117642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation 跨语言跨时态摘要:数据集、模型、评估
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-05-16 DOI: 10.1162/coli_a_00519
Ran Zhang, Jihed Ouni, Steffen Eger
{"title":"Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation","authors":"Ran Zhang, Jihed Ouni, Steffen Eger","doi":"10.1162/coli_a_00519","DOIUrl":"https://doi.org/10.1162/coli_a_00519","url":null,"abstract":"While summarization has been extensively researched in natural language processing (NLP), cross-lingual cross-temporal summarization (CLCTS) is a largely unexplored area that has the potential to improve cross-cultural accessibility and understanding. This paper comprehensively addresses the CLCTS task, including dataset creation, modeling, and evaluation. We (1) build the first CLCTS corpus with 328 (+127) instances for hDe-En and 289 (+212) for hEn-De, leveraging historical fiction texts and Wikipedia summaries in English and German; (2) examine the effectiveness of popular transformer end-to-end models with different intermediate finetuning tasks; (3) explore the potential of GPT-3.5 as a summarizer; (4) report evaluations from humans, GPT-4, and several recent automatic evaluation metrics. Our results indicate that intermediate task finetuned end-to-end models generate bad to moderate quality summaries while GPT-3.5, as a zero-shot summarizer, provides moderate to good quality outputs. GPT-3.5 also seems very adept at normalizing historical text. To assess data contamination in GPT-3.5, we design an adversarial attack scheme in which we find that GPT-3.5 performs slightly worse for unseen source documents compared to seen documents. Moreover, it sometimes hallucinates when the source sentences are inverted against its prior knowledge with a summarization accuracy of 0.67 for plot omission, 0.71 for entity swap, and 0.53 for plot negation. Overall, our regression results of model performances suggest that longer, older, and more complex source texts (all of which are more characteristic for historical language variants) are harder to summarize for all models, indicating the difficulty of the CLCTS task. Regarding evaluation, we observe that both GPT-4 and BERTScore correlate moderately with human evaluations but GPT-4 is prone to giving lower scores.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141059917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aligning Human and Computational Coherence Evaluations 调整人类和计算的一致性评估
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-05-02 DOI: 10.1162/coli_a_00518
Jia Peng Lim, Hady W. Lauw
{"title":"Aligning Human and Computational Coherence Evaluations","authors":"Jia Peng Lim, Hady W. Lauw","doi":"10.1162/coli_a_00518","DOIUrl":"https://doi.org/10.1162/coli_a_00518","url":null,"abstract":"Automated coherence metrics constitute an efficient and popular way to evaluate topic models. Previous works present a mixed picture of their presumed correlation with human judgment. This work proposes a novel sampling approach to mine topic representations at a large-scale while seeking to mitigate bias from sampling, enabling the investigation of widely-used automated coherence metrics via large corpora. Additionally, this article proposes a novel user study design, an amalgamation of different proxy tasks, to derive a finer insight into the human decision-making processes. This design subsumes the purpose of simple rating and outlier-detection user studies. Similar to the sampling approach, the user study conducted is very extensive, comprising forty study participants split into eight different study groups tasked with evaluating their respective set of one hundred topic representations. Usually, when substantiating the use of these metrics, human responses are treated as the golden standard. This article further investigates the reliability of human judgment by flipping the comparison and conducting a novel extended analysis of human response at the group and individual level against a generic corpus. The investigation results show a moderate to good correlation between these metrics and human judgment, especially for generic corpora, and derive further insights into the human perception of coherence. Analysing inter-metric correlations across corpora shows moderate to good correlation amongst these metrics. As these metrics depend on corpus statistics, this article further investigates the topical differences between corpora revealing nuances in applications of these metrics.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140838018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing Dataset Annotation Quality Management in the Wild 分析野生数据集注释质量管理
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-03-25 DOI: 10.1162/coli_a_00516
Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych
{"title":"Analyzing Dataset Annotation Quality Management in the Wild","authors":"Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych","doi":"10.1162/coli_a_00516","DOIUrl":"https://doi.org/10.1162/coli_a_00516","url":null,"abstract":"Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models as well as for their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exist, to our knowledge, large-scale analysis has yet to be performed on how quality management is conducted when creating natural language datasets and whether these recommendations are followed. Therefore, we first survey and summarize recommended quality management practices for dataset creation as described in the literature and provide suggestions for applying them. Then, we compile a corpus of 591 scientific publications introducing text datasets and annotate it for quality-related aspects, such as annotator management, agreement, adjudication, or data validation. Using these annotations, we then analyze how quality management is conducted in practice. A majority of the annotated publications apply good or excellent quality management. However, we deem the effort of 30% of the works as only subpar. Our analysis also shows common errors, especially when using inter-annotator agreement and computing annotation error rates.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cognitive Plausibility in Natural Language Processing by Lisa Beinborn & Nora Hollenstein 自然语言处理中的认知似是而非》,丽莎-贝恩伯恩和诺拉-霍伦斯坦著
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-03-21 DOI: 10.1162/coli_r_00517
Yevgen Matusevych
{"title":"Cognitive Plausibility in Natural Language Processing by Lisa Beinborn & Nora Hollenstein","authors":"Yevgen Matusevych","doi":"10.1162/coli_r_00517","DOIUrl":"https://doi.org/10.1162/coli_r_00517","url":null,"abstract":"","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140387939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM–Assisted Data Augmentation for Chinese Dialogue–Level Dependency Parsing 用于中文对话级依赖关系解析的 LLM 辅助数据扩展
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-03-12 DOI: 10.1162/coli_a_00515
Meishan Zhang, Gongyao Jiang, Shuang Liu, Jing Chen, Min Zhang
{"title":"LLM–Assisted Data Augmentation for Chinese Dialogue–Level Dependency Parsing","authors":"Meishan Zhang, Gongyao Jiang, Shuang Liu, Jing Chen, Min Zhang","doi":"10.1162/coli_a_00515","DOIUrl":"https://doi.org/10.1162/coli_a_00515","url":null,"abstract":"Dialogue–level dependency parsing, despite its growing academic interest, often encounters underperformance issues due to resource shortages. A potential solution to this challenge is data augmentation. In recent years, large language models (LLMs) have demonstrated strong capabilities in generation which can facilitate data augmentation greatly. In this study, we focus on Chinese dialogue–level dependency parsing, presenting three simple and effective strategies with LLM to augment the original training instances, namely word–level, syntax–level and discourse–level augmentations, respectively. These strategies enable LLMs to either preserve or modify dependency structures, thereby assuring accuracy while increasing the diversity of instances at different levels. We conduct experiments on the benchmark dataset released by Jiang et al. (2023) to validate our approach. Results show that our method can greatly boost the parsing performance in various settings, particularly in dependencies among elementary discourse units (EDUs). Lastly, we provide in–depth analysis to show the key points of our data augmentation strategies.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts 系统回顾破译青铜时代爱琴海和塞浦路斯文字的计算方法
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-03-08 DOI: 10.1162/coli_a_00514
Maja Braović, Damir Krstinić, Maja Štula, Antonia Ivanda
{"title":"A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts","authors":"Maja Braović, Damir Krstinić, Maja Štula, Antonia Ivanda","doi":"10.1162/coli_a_00514","DOIUrl":"https://doi.org/10.1162/coli_a_00514","url":null,"abstract":"This paper provides a detailed insight into computational approaches for deciphering Bronze Age Aegean and Cypriot scripts, namely the Archanes script and the Archanes formula, Phaistos Disk, Cretan hieroglyphic (including the Malia Altar Stone and Arkalochori Axe), Linear A, Linear B, Cypro-Minoan and Cypriot scripts. The unique contributions of this paper are threefold: 1) a thorough review of major Bronze Age Aegean and Cypriot scripts and inscriptions, digital data and corpora associated with them, existing computational decipherment methods developed in order to decipher them, and possible links to other scripts and languages; 2) the definition of 15 major challenges that can be encountered in computational decipherments of ancient scripts; and 3) an outline of a computational model that could possibly be used to simulate traditional decipherment processes of ancient scripts based on palaeography and epigraphy. In the context of this paper the term decipherment denotes the process of discovery of the language and/or the set of symbols behind an unknown script, and the meaning behind it.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140074695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Essay Scoring by Beata Beigman Klebanov and Nitin Madnani Beata Beigman Klebanov 和 Nitin Madnani 的论文自动评分系统
IF 9.3 2区 计算机科学
Computational Linguistics Pub Date : 2024-03-04 DOI: 10.1162/coli_r_00513
Anaïs Tack
{"title":"Automated Essay Scoring by Beata Beigman Klebanov and Nitin Madnani","authors":"Anaïs Tack","doi":"10.1162/coli_r_00513","DOIUrl":"https://doi.org/10.1162/coli_r_00513","url":null,"abstract":"","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":null,"pages":null},"PeriodicalIF":9.3,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140265784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信