Annual Meeting of the Association for Computational Linguistics最新文献

筛选
英文 中文
MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System 迈向可靠的多模态讽刺检测系统
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-14 DOI: 10.48550/arXiv.2307.07135
Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin Liang, Wanxiang Che, Ruifeng Xu
{"title":"MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System","authors":"Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin Liang, Wanxiang Che, Ruifeng Xu","doi":"10.48550/arXiv.2307.07135","DOIUrl":"https://doi.org/10.48550/arXiv.2307.07135","url":null,"abstract":"Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system: (1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable. To solve the aforementioned issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD, by removing the spurious cues and re-annotating the unreasonable samples. Meanwhile, we present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems and multi-view CLIP can significantly outperform the previous best baselines.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122356087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section 充分利用有限的上下文长度:预测能力因临床笔记类型和笔记部分而异
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-13 DOI: 10.48550/arXiv.2307.07051
Hongyi Zheng, Yixin Zhu, L. Jiang, K. Cho, E. Oermann
{"title":"Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section","authors":"Hongyi Zheng, Yixin Zhu, L. Jiang, K. Cho, E. Oermann","doi":"10.48550/arXiv.2307.07051","DOIUrl":"https://doi.org/10.48550/arXiv.2307.07051","url":null,"abstract":"We propose a data-driven framework to select clinical note sections with high predictive power.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134334136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models 自蒸馏量化:在基于变压器的语言模型中实现高压缩率
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-12 DOI: 10.48550/arXiv.2307.05972
James O'Neill, Sourav Dutta
{"title":"Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models","authors":"James O'Neill, Sourav Dutta","doi":"10.48550/arXiv.2307.05972","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05972","url":null,"abstract":"We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models. We present a new method called self-distilled quantization (SDQ) that minimizes accumulative quantization errors and outperforms baselines. We apply SDQ to multilingual models XLM-R_{text{Base}} and InfoXLM_{text{Base}} and demonstrate that both models can be reduced from 32-bit floating point weights to 8-bit integer weights while maintaining a high level of performance on the XGLUE benchmark. Our results also highlight the challenges of quantizing multilingual models, which must generalize to languages they were not fine-tuned on.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISLTranslate: Dataset for Translating Indian Sign Language ISLTranslate:印度手语翻译数据集
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-11 DOI: 10.48550/arXiv.2307.05440
Abhinav Joshi, Susmit Agrawal, Ashutosh Modi
{"title":"ISLTranslate: Dataset for Translating Indian Sign Language","authors":"Abhinav Joshi, Susmit Agrawal, Ashutosh Modi","doi":"10.48550/arXiv.2307.05440","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05440","url":null,"abstract":"Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources for the Indian sign language. This resource paper introduces ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best of our knowledge, it is the largest translation dataset for continuous Indian Sign Language. We provide a detailed analysis of the dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, we benchmark the created dataset with a transformer-based model for ISL translation.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"C-24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126476057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features 赋能具有类型学特征的NLP模型的跨语言行为测试
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-11 DOI: 10.48550/arXiv.2307.05454
Ester Hlavnova, Sebastian Ruder
{"title":"Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features","authors":"Ester Hlavnova, Sebastian Ruder","doi":"10.48550/arXiv.2307.05454","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05454","url":null,"abstract":"A challenge towards developing NLP systems for the world’s languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models’ behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131616421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning to Generate Equitable Text in Dialogue from Biased Training Data 学习从有偏见的训练数据中生成公平的对话文本
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-10 DOI: 10.48550/arXiv.2307.04303
Anthony Sicilia, Malihe Alikhani
{"title":"Learning to Generate Equitable Text in Dialogue from Biased Training Data","authors":"Anthony Sicilia, Malihe Alikhani","doi":"10.48550/arXiv.2307.04303","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04303","url":null,"abstract":"The ingrained principles of fairness in a dialogue system’s decision-making process and generated responses are crucial for user engagement, satisfaction, and task achievement. Absence of equitable and inclusive principles can hinder the formation of common ground, which in turn negatively impacts the overall performance of the system. For example, misusing pronouns in a user interaction may cause ambiguity about the intended subject. Yet, there is no comprehensive study of equitable text generation in dialogue. Aptly, in this work, we use theories of computational learning to study this problem. We provide formal definitions of equity in text generation, and further, prove formal connections between learning human-likeness and learning equity: algorithms for improving equity ultimately reduce to algorithms for improving human-likeness (on augmented data). With this insight, we also formulate reasonable conditions under which text generation algorithms can learn to generate equitable text without any modifications to the biased training data on which they learn. To exemplify our theory in practice, we look at a group of algorithms for the GuessWhat?! visual dialogue game and, using this example, test our theory empirically. Our theory accurately predicts relative-performance of multiple algorithms in generating equitable text as measured by both human and automated evaluation.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Answering Ambiguous Questions via Iterative Prompting 通过迭代提示回答模棱两可的问题
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-08 DOI: 10.48550/arXiv.2307.03897
Weiwei Sun, Hengyi Cai, Hongshen Chen, Pengjie Ren, Zhumin Chen, Maarten de Rijke, Z. Ren
{"title":"Answering Ambiguous Questions via Iterative Prompting","authors":"Weiwei Sun, Hengyi Cai, Hongshen Chen, Pengjie Ren, Zhumin Chen, Maarten de Rijke, Z. Ren","doi":"10.48550/arXiv.2307.03897","DOIUrl":"https://doi.org/10.48550/arXiv.2307.03897","url":null,"abstract":"In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist.To provide feasible answers to an ambiguous question,one approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity.An alternative is to gather candidate answers and aggregate them, but this method can be computationally costly and may neglect dependencies among answers.In this paper, we present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions.Specifically, we integrate an answering model with a prompting model in an iterative manner.The prompting model adaptively tracks the reading process and progressively triggers the answering model to compose distinct and relevant answers. Additionally, we develop a task-specific post-pretraining approach for both the answering model and the prompting model, which greatly improves the performance of our framework. Empirical studies on two commonly-used open benchmarks show that AmbigPrompt achieves state-of-the-art or competitive results while using less memory and having a lower inference latency than competing approaches. Additionally, AmbigPrompt also performs well in low-resource settings.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115639598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Incomplete Utterance Rewriting as Sequential Greedy Tagging 序贯贪婪标记的不完全话语重写
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-08 DOI: 10.48550/arXiv.2307.06337
Yuxiang Chen
{"title":"Incomplete Utterance Rewriting as Sequential Greedy Tagging","authors":"Yuxiang Chen","doi":"10.48550/arXiv.2307.06337","DOIUrl":"https://doi.org/10.48550/arXiv.2307.06337","url":null,"abstract":"The task of incomplete utterance rewriting has recently gotten much attention. Previous models struggled to extract information from the dialogue context, as evidenced by the low restoration scores. To address this issue, we propose a novel sequence tagging-based model, which is more adept at extracting information from context. Meanwhile, we introduce speaker-aware embedding to model speaker variation. Experiments on multiple public datasets show that our model achieves optimal results on all nine restoration scores while having other metric scores comparable to previous state-of-the-art models. Furthermore, benefitting from the model's simplicity, our approach outperforms most previous models on inference speed.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121784340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation 重新审视跨语言摘要:基于语料库的研究和改进标注的新基准
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-08 DOI: 10.48550/arXiv.2307.04018
Yulong Chen, Huajian Zhang, Yijie Zhou, Xuefeng Bai, Yueguan Wang, Ming Zhong, Jianhao Yan, Yafu Li, Judy Li, Xianchao Zhu, Yue Zhang
{"title":"Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation","authors":"Yulong Chen, Huajian Zhang, Yijie Zhou, Xuefeng Bai, Yueguan Wang, Ming Zhong, Jianhao Yan, Yafu Li, Judy Li, Xianchao Zhu, Yue Zhang","doi":"10.48550/arXiv.2307.04018","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04018","url":null,"abstract":"Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes.To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context.ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions.We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text.Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process.Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation.Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128404172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Automatic Quotation Attribution in Literary Novels 改进文学小说引文自动归因
Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-07-07 DOI: 10.48550/arXiv.2307.03734
Krishnapriya Vishnubhotla, Frank Rudzicz, Graeme Hirst, Adam Hammond
{"title":"Improving Automatic Quotation Attribution in Literary Novels","authors":"Krishnapriya Vishnubhotla, Frank Rudzicz, Graeme Hirst, Adam Hammond","doi":"10.48550/arXiv.2307.03734","DOIUrl":"https://doi.org/10.48550/arXiv.2307.03734","url":null,"abstract":"Current models for quotation attribution in literary novels assume varying levels of available information in their training and test data, which poses a challenge for in-the-wild inference. Here, we approach quotation attribution as a set of four interconnected sub-tasks: character identification, coreference resolution, quotation identification, and speaker attribution. We benchmark state-of-the-art models on each of these sub-tasks independently, using a large dataset of annotated coreferences and quotations in literary novels (the Project Dialogism Novel Corpus). We also train and evaluate models for the speaker attribution task in particular, showing that a simple sequential prediction model achieves accuracy scores on par with state-of-the-art models.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120944219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信