Conference on Empirical Methods in Natural Language Processing最新文献_第2页

Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering 为提取式开放域问题解答训练融合解码器时语境质量的重要性

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-03-21 DOI: 10.18653/v1/2023.findings-emnlp.784

Kosuke Akimoto, Kunihiro Takeoka, M. Oyamada

{"title":"Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering","authors":"Kosuke Akimoto, Kunihiro Takeoka, M. Oyamada","doi":"10.18653/v1/2023.findings-emnlp.784","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-emnlp.784","url":null,"abstract":"Retrieval-augmented generation models augment knowledge encoded in a language model by providing additional relevant external knowledge (context) during generation. Although it has been shown that the quantity and quality of context impact the performance of retrieval-augmented generation models during inference, limited research explores how these characteristics affect model training. This paper explores how context quantity and quality during model training affect the performance of Fusion-in-Decoder (FiD), the state-of-the-art retrieval-augmented generation model, in extractive open-domain question answering tasks. Experimental results suggest that FiD models overfit to context quality during training and show suboptimal performance when evaluated on different context quality. Through the experimental results, we also reveal FiD models trained with different context quality have different cross-attention distribution patterns. Specifically, as context quality during training increases, FiD models tend to attend more uniformly to each passage in context. Finally, based on these observations, we propose a method to mitigate overfitting to specific context quality by introducing bias to the cross-attention distribution, which we demonstrate to be effective in improving the performance of FiD models on different context quality.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":" 55","pages":"11711-11729"},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140221147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning to Describe for Predicting Zero-shot Drug-Drug Interactions 通过学习描述预测药物间的零反应

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-03-13 DOI: 10.18653/v1/2023.emnlp-main.918

Fangqi Zhu, Yongqi Zhang, Lei Chen, Bing Qin, Ruifeng Xu

{"title":"Learning to Describe for Predicting Zero-shot Drug-Drug Interactions","authors":"Fangqi Zhu, Yongqi Zhang, Lei Chen, Bing Qin, Ruifeng Xu","doi":"10.18653/v1/2023.emnlp-main.918","DOIUrl":"https://doi.org/10.18653/v1/2023.emnlp-main.918","url":null,"abstract":"Adverse drug-drug interactions~(DDIs) can compromise the effectiveness of concurrent drug administration, posing a significant challenge in healthcare. As the development of new drugs continues, the potential for unknown adverse effects resulting from DDIs becomes a growing concern. Traditional computational methods for DDI prediction may fail to capture interactions for new drugs due to the lack of knowledge. In this paper, we introduce a new problem setup as zero-shot DDI prediction that deals with the case of new drugs. Leveraging textual information from online databases like DrugBank and PubChem, we propose an innovative approach TextDDI with a language model-based DDI predictor and a reinforcement learning~(RL)-based information selector, enabling the selection of concise and pertinent text for accurate DDI prediction on new drugs. Empirical results show the benefits of the proposed approach on several settings including zero-shot and few-shot DDI prediction, and the selected texts are semantically relevant. Our code and data are available at url{https://github.com/zhufq00/DDIs-Prediction}.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"3 4","pages":"14855-14870"},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140245332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs EasyQuant：用于 LLM 的高效无数据量化算法

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-03-05 DOI: 10.18653/v1/2023.emnlp-main.565

Hanlin Tang, Yifu Sun, Decheng Wu, Kai Liu, Jianchen Zhu, Zhanhui Kang

{"title":"EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs","authors":"Hanlin Tang, Yifu Sun, Decheng Wu, Kai Liu, Jianchen Zhu, Zhanhui Kang","doi":"10.18653/v1/2023.emnlp-main.565","DOIUrl":"https://doi.org/10.18653/v1/2023.emnlp-main.565","url":null,"abstract":"Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using few samples from the training data, which might affect the generalization of the quantized LLMs to unknown cases and tasks. Hence in this work, we explore an important question: Can we design a data-independent quantization method for LLMs to guarantee its generalization performance? In this work, we propose EasyQuant, a training-free and data-independent weight-only quantization algorithm for LLMs. Our observation indicates that two factors: outliers in the weight and quantization ranges, are essential for reducing the quantization error. Therefore, in EasyQuant, we leave the outliers (less than 1%) unchanged and optimize the quantization range to reduce the reconstruction error. With these methods, we surprisingly find that EasyQuant achieves comparable performance to the original model. Since EasyQuant does not depend on any training data, the generalization performance of quantized LLMs is safely guaranteed. Moreover, EasyQuant can be implemented in parallel so that the quantized model could be attained in a few minutes even for LLMs over 100B. To our best knowledge, we are the first work that achieves almost lossless quantization performance for LLMs under a data-independent setting and our algorithm runs over 10 times faster than the data-dependent methods.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"128 30","pages":"9119-9128"},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140078632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Exploiting Emotion-Semantic Correlations for Empathetic Response Generation 利用情感-语义关联生成同情反应

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-27 DOI: 10.18653/v1/2023.findings-emnlp.320

Zhou Yang, Zhaochun Ren, Yufeng Wang, Xiaofei Zhu, Zhihao Chen, Tiecheng Cai, Yunbing Wu, Yisong Su, Sibo Ju, Xiangwen Liao

{"title":"Exploiting Emotion-Semantic Correlations for Empathetic Response Generation","authors":"Zhou Yang, Zhaochun Ren, Yufeng Wang, Xiaofei Zhu, Zhihao Chen, Tiecheng Cai, Yunbing Wu, Yisong Su, Sibo Ju, Xiangwen Liao","doi":"10.18653/v1/2023.findings-emnlp.320","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-emnlp.320","url":null,"abstract":"Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with other grammar semantic roles, i.e., words with semantic meanings, in grammar. Previous methods overlook these two characteristics, which easily lead to misunderstandings of emotions and neglect of key semantics. To address this issue, we propose a dynamical Emotion-Semantic Correlation Model (ESCM) for empathetic dialogue generation tasks. ESCM constructs dynamic emotion-semantic vectors through the interaction of context and emotions. We introduce dependency trees to reflect the correlations between emotions and semantics. Based on dynamic emotion-semantic vectors and dependency trees, we propose a dynamic correlation graph convolutional network to guide the model in learning context meanings in dialogue and generating empathetic responses. Experimental results on the EMPATHETIC-DIALOGUES dataset show that ESCM understands semantics and emotions more accurately and expresses fluent and informative empathetic responses. Our analysis results also indicate that the correlations between emotions and semantics are frequently used in dialogues, which is of great significance for empathetic perception and expression.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"59 2","pages":"4826-4837"},"PeriodicalIF":0.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140424184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Multi-word Tokenization for Sequence Compression 序列压缩的多字标记化

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-15 DOI: 10.18653/v1/2023.emnlp-industry.58

Leonidas Gee, Leonardo Rigutini, M. Ernandes, Andrea Zugarini

引用次数: 1

Multi-word Tokenization for Sequence Compression 序列压缩的多字标记化

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-15 DOI: 10.18653/v1/2023.emnlp-industry.58

Leonidas Gee, Leonardo Rigutini, M. Ernandes, Andrea Zugarini

引用次数: 1

BUSTER: a "BUSiness Transaction Entity Recognition" dataset BUSTER："商业交易实体识别 "数据集

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-15 DOI: 10.18653/v1/2023.emnlp-industry.57

Andrea Zugarini, Andrew Zamai, M. Ernandes, Leonardo Rigutini

引用次数: 0

BUSTER: a "BUSiness Transaction Entity Recognition" dataset BUSTER："商业交易实体识别 "数据集

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-15 DOI: 10.18653/v1/2023.emnlp-industry.57

Andrea Zugarini, Andrew Zamai, M. Ernandes, Leonardo Rigutini

引用次数: 0

Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning 用于软提示调整的贝叶斯多任务转移学习

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-13 DOI: 10.18653/v1/2023.findings-emnlp.329

Haeju Lee, Minchan Jeong, SeYoung Yun, Kee-Eung Kim

{"title":"Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning","authors":"Haeju Lee, Minchan Jeong, SeYoung Yun, Kee-Eung Kim","doi":"10.18653/v1/2023.findings-emnlp.329","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-emnlp.329","url":null,"abstract":"Prompt tuning, in which prompts are optimized to adapt large-scale pre-trained language models to downstream tasks instead of fine-tuning the full model parameters, has been shown to be particularly effective when the prompts are trained in a multi-task transfer learning setting. These methods generally involve individually training prompts for each source task and then aggregating them to provide the initialization of the prompt for the target task. However, this approach critically ignores the fact that some of the source tasks could be negatively or positively interfering with each other. We argue that when we extract knowledge from source tasks via training source prompts, we need to consider this correlation among source tasks for better transfer to target tasks. To this end, we propose a Bayesian approach where we work with the posterior distribution of prompts across source tasks. We obtain representative source prompts corresponding to the samples from the posterior utilizing Stein Variational Gradient Descent, which are then aggregated to constitute the initial target prompt. We show extensive experimental results on the standard benchmark NLP tasks, where our Bayesian multi-task transfer learning approach outperforms the state-of-the-art methods in many settings. Furthermore, our approach requires no auxiliary models other than the prompt itself, achieving a high degree of parameter efficiency.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"28 6","pages":"4942-4958"},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139841096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning 用于软提示调整的贝叶斯多任务转移学习

Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-13 DOI: 10.18653/v1/2023.findings-emnlp.329

Haeju Lee, Minchan Jeong, SeYoung Yun, Kee-Eung Kim

{"title":"Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning","authors":"Haeju Lee, Minchan Jeong, SeYoung Yun, Kee-Eung Kim","doi":"10.18653/v1/2023.findings-emnlp.329","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-emnlp.329","url":null,"abstract":"Prompt tuning, in which prompts are optimized to adapt large-scale pre-trained language models to downstream tasks instead of fine-tuning the full model parameters, has been shown to be particularly effective when the prompts are trained in a multi-task transfer learning setting. These methods generally involve individually training prompts for each source task and then aggregating them to provide the initialization of the prompt for the target task. However, this approach critically ignores the fact that some of the source tasks could be negatively or positively interfering with each other. We argue that when we extract knowledge from source tasks via training source prompts, we need to consider this correlation among source tasks for better transfer to target tasks. To this end, we propose a Bayesian approach where we work with the posterior distribution of prompts across source tasks. We obtain representative source prompts corresponding to the samples from the posterior utilizing Stein Variational Gradient Descent, which are then aggregated to constitute the initial target prompt. We show extensive experimental results on the standard benchmark NLP tasks, where our Bayesian multi-task transfer learning approach outperforms the state-of-the-art methods in many settings. Furthermore, our approach requires no auxiliary models other than the prompt itself, achieving a high degree of parameter efficiency.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"97 26","pages":"4942-4958"},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139781309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0