Conference on Empirical Methods in Natural Language Processing最新文献

筛选
英文 中文
Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering 为提取式开放域问题解答训练融合解码器时语境质量的重要性
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-03-21 DOI: 10.18653/v1/2023.findings-emnlp.784
Kosuke Akimoto, Kunihiro Takeoka, M. Oyamada
{"title":"Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering","authors":"Kosuke Akimoto, Kunihiro Takeoka, M. Oyamada","doi":"10.18653/v1/2023.findings-emnlp.784","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-emnlp.784","url":null,"abstract":"Retrieval-augmented generation models augment knowledge encoded in a language model by providing additional relevant external knowledge (context) during generation. Although it has been shown that the quantity and quality of context impact the performance of retrieval-augmented generation models during inference, limited research explores how these characteristics affect model training. This paper explores how context quantity and quality during model training affect the performance of Fusion-in-Decoder (FiD), the state-of-the-art retrieval-augmented generation model, in extractive open-domain question answering tasks. Experimental results suggest that FiD models overfit to context quality during training and show suboptimal performance when evaluated on different context quality. Through the experimental results, we also reveal FiD models trained with different context quality have different cross-attention distribution patterns. Specifically, as context quality during training increases, FiD models tend to attend more uniformly to each passage in context. Finally, based on these observations, we propose a method to mitigate overfitting to specific context quality by introducing bias to the cross-attention distribution, which we demonstrate to be effective in improving the performance of FiD models on different context quality.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":" 55","pages":"11711-11729"},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140221147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to Describe for Predicting Zero-shot Drug-Drug Interactions 通过学习描述预测药物间的零反应
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-03-13 DOI: 10.18653/v1/2023.emnlp-main.918
Fangqi Zhu, Yongqi Zhang, Lei Chen, Bing Qin, Ruifeng Xu
{"title":"Learning to Describe for Predicting Zero-shot Drug-Drug Interactions","authors":"Fangqi Zhu, Yongqi Zhang, Lei Chen, Bing Qin, Ruifeng Xu","doi":"10.18653/v1/2023.emnlp-main.918","DOIUrl":"https://doi.org/10.18653/v1/2023.emnlp-main.918","url":null,"abstract":"Adverse drug-drug interactions~(DDIs) can compromise the effectiveness of concurrent drug administration, posing a significant challenge in healthcare. As the development of new drugs continues, the potential for unknown adverse effects resulting from DDIs becomes a growing concern. Traditional computational methods for DDI prediction may fail to capture interactions for new drugs due to the lack of knowledge. In this paper, we introduce a new problem setup as zero-shot DDI prediction that deals with the case of new drugs. Leveraging textual information from online databases like DrugBank and PubChem, we propose an innovative approach TextDDI with a language model-based DDI predictor and a reinforcement learning~(RL)-based information selector, enabling the selection of concise and pertinent text for accurate DDI prediction on new drugs. Empirical results show the benefits of the proposed approach on several settings including zero-shot and few-shot DDI prediction, and the selected texts are semantically relevant. Our code and data are available at url{https://github.com/zhufq00/DDIs-Prediction}.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"3 4","pages":"14855-14870"},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140245332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs EasyQuant:用于 LLM 的高效无数据量化算法
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-03-05 DOI: 10.18653/v1/2023.emnlp-main.565
Hanlin Tang, Yifu Sun, Decheng Wu, Kai Liu, Jianchen Zhu, Zhanhui Kang
{"title":"EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs","authors":"Hanlin Tang, Yifu Sun, Decheng Wu, Kai Liu, Jianchen Zhu, Zhanhui Kang","doi":"10.18653/v1/2023.emnlp-main.565","DOIUrl":"https://doi.org/10.18653/v1/2023.emnlp-main.565","url":null,"abstract":"Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using few samples from the training data, which might affect the generalization of the quantized LLMs to unknown cases and tasks. Hence in this work, we explore an important question: Can we design a data-independent quantization method for LLMs to guarantee its generalization performance? In this work, we propose EasyQuant, a training-free and data-independent weight-only quantization algorithm for LLMs. Our observation indicates that two factors: outliers in the weight and quantization ranges, are essential for reducing the quantization error. Therefore, in EasyQuant, we leave the outliers (less than 1%) unchanged and optimize the quantization range to reduce the reconstruction error. With these methods, we surprisingly find that EasyQuant achieves comparable performance to the original model. Since EasyQuant does not depend on any training data, the generalization performance of quantized LLMs is safely guaranteed. Moreover, EasyQuant can be implemented in parallel so that the quantized model could be attained in a few minutes even for LLMs over 100B. To our best knowledge, we are the first work that achieves almost lossless quantization performance for LLMs under a data-independent setting and our algorithm runs over 10 times faster than the data-dependent methods.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"128 30","pages":"9119-9128"},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140078632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting Emotion-Semantic Correlations for Empathetic Response Generation 利用情感-语义关联生成同情反应
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-27 DOI: 10.18653/v1/2023.findings-emnlp.320
Zhou Yang, Zhaochun Ren, Yufeng Wang, Xiaofei Zhu, Zhihao Chen, Tiecheng Cai, Yunbing Wu, Yisong Su, Sibo Ju, Xiangwen Liao
{"title":"Exploiting Emotion-Semantic Correlations for Empathetic Response Generation","authors":"Zhou Yang, Zhaochun Ren, Yufeng Wang, Xiaofei Zhu, Zhihao Chen, Tiecheng Cai, Yunbing Wu, Yisong Su, Sibo Ju, Xiangwen Liao","doi":"10.18653/v1/2023.findings-emnlp.320","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-emnlp.320","url":null,"abstract":"Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with other grammar semantic roles, i.e., words with semantic meanings, in grammar. Previous methods overlook these two characteristics, which easily lead to misunderstandings of emotions and neglect of key semantics. To address this issue, we propose a dynamical Emotion-Semantic Correlation Model (ESCM) for empathetic dialogue generation tasks. ESCM constructs dynamic emotion-semantic vectors through the interaction of context and emotions. We introduce dependency trees to reflect the correlations between emotions and semantics. Based on dynamic emotion-semantic vectors and dependency trees, we propose a dynamic correlation graph convolutional network to guide the model in learning context meanings in dialogue and generating empathetic responses. Experimental results on the EMPATHETIC-DIALOGUES dataset show that ESCM understands semantics and emotions more accurately and expresses fluent and informative empathetic responses. Our analysis results also indicate that the correlations between emotions and semantics are frequently used in dialogues, which is of great significance for empathetic perception and expression.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"59 2","pages":"4826-4837"},"PeriodicalIF":0.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140424184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-word Tokenization for Sequence Compression 序列压缩的多字标记化
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-15 DOI: 10.18653/v1/2023.emnlp-industry.58
Leonidas Gee, Leonardo Rigutini, M. Ernandes, Andrea Zugarini
{"title":"Multi-word Tokenization for Sequence Compression","authors":"Leonidas Gee, Leonardo Rigutini, M. Ernandes, Andrea Zugarini","doi":"10.18653/v1/2023.emnlp-industry.58","DOIUrl":"https://doi.org/10.18653/v1/2023.emnlp-industry.58","url":null,"abstract":"Large Language Models have proven highly successful at modelling a variety of tasks. However, this comes at a steep computational cost that hinders wider industrial uptake. In this pa005 per, we present MWT: a Multi-Word Tokenizer that goes beyond word boundaries by representing frequent multi-word expressions as single tokens. MWTs produce a more compact and efficient tokenization that yields two benefits: (1) Increase in performance due to a greater coverage of input data given a fixed sequence length and budget; (2) Faster and lighter inference due to the ability to reduce the sequence length with negligible drops in performance. Our results show that MWT is more robust across shorter sequence lengths, thus allowing for major speedups via early sequence truncation.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"300 1","pages":"612-621"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139834030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-word Tokenization for Sequence Compression 序列压缩的多字标记化
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-15 DOI: 10.18653/v1/2023.emnlp-industry.58
Leonidas Gee, Leonardo Rigutini, M. Ernandes, Andrea Zugarini
{"title":"Multi-word Tokenization for Sequence Compression","authors":"Leonidas Gee, Leonardo Rigutini, M. Ernandes, Andrea Zugarini","doi":"10.18653/v1/2023.emnlp-industry.58","DOIUrl":"https://doi.org/10.18653/v1/2023.emnlp-industry.58","url":null,"abstract":"Large Language Models have proven highly successful at modelling a variety of tasks. However, this comes at a steep computational cost that hinders wider industrial uptake. In this pa005 per, we present MWT: a Multi-Word Tokenizer that goes beyond word boundaries by representing frequent multi-word expressions as single tokens. MWTs produce a more compact and efficient tokenization that yields two benefits: (1) Increase in performance due to a greater coverage of input data given a fixed sequence length and budget; (2) Faster and lighter inference due to the ability to reduce the sequence length with negligible drops in performance. Our results show that MWT is more robust across shorter sequence lengths, thus allowing for major speedups via early sequence truncation.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"8 6","pages":"612-621"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139774370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BUSTER: a "BUSiness Transaction Entity Recognition" dataset BUSTER:"商业交易实体识别 "数据集
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-15 DOI: 10.18653/v1/2023.emnlp-industry.57
Andrea Zugarini, Andrew Zamai, M. Ernandes, Leonardo Rigutini
{"title":"BUSTER: a \"BUSiness Transaction Entity Recognition\" dataset","authors":"Andrea Zugarini, Andrew Zamai, M. Ernandes, Leonardo Rigutini","doi":"10.18653/v1/2023.emnlp-industry.57","DOIUrl":"https://doi.org/10.18653/v1/2023.emnlp-industry.57","url":null,"abstract":"Albeit Natural Language Processing has seen major breakthroughs in the last few years, transferring such advances into real-world business cases can be challenging. One of the reasons resides in the displacement between popular benchmarks and actual data. Lack of supervision, unbalanced classes, noisy data and long documents often affect real problems in vertical domains such as finance, law and health. To support industry-oriented research, we present BUSTER, a BUSiness Transaction Entity Recognition dataset. The dataset consists of 3779 manually annotated documents on financial transactions. We establish several baselines exploiting both general-purpose and domain-specific language models. The best performing model is also used to automatically annotate 6196 documents, which we release as an additional silver corpus to BUSTER.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"1 11","pages":"605-611"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139774605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BUSTER: a "BUSiness Transaction Entity Recognition" dataset BUSTER:"商业交易实体识别 "数据集
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-15 DOI: 10.18653/v1/2023.emnlp-industry.57
Andrea Zugarini, Andrew Zamai, M. Ernandes, Leonardo Rigutini
{"title":"BUSTER: a \"BUSiness Transaction Entity Recognition\" dataset","authors":"Andrea Zugarini, Andrew Zamai, M. Ernandes, Leonardo Rigutini","doi":"10.18653/v1/2023.emnlp-industry.57","DOIUrl":"https://doi.org/10.18653/v1/2023.emnlp-industry.57","url":null,"abstract":"Albeit Natural Language Processing has seen major breakthroughs in the last few years, transferring such advances into real-world business cases can be challenging. One of the reasons resides in the displacement between popular benchmarks and actual data. Lack of supervision, unbalanced classes, noisy data and long documents often affect real problems in vertical domains such as finance, law and health. To support industry-oriented research, we present BUSTER, a BUSiness Transaction Entity Recognition dataset. The dataset consists of 3779 manually annotated documents on financial transactions. We establish several baselines exploiting both general-purpose and domain-specific language models. The best performing model is also used to automatically annotate 6196 documents, which we release as an additional silver corpus to BUSTER.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"406 1","pages":"605-611"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139834359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning 用于软提示调整的贝叶斯多任务转移学习
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-13 DOI: 10.18653/v1/2023.findings-emnlp.329
Haeju Lee, Minchan Jeong, SeYoung Yun, Kee-Eung Kim
{"title":"Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning","authors":"Haeju Lee, Minchan Jeong, SeYoung Yun, Kee-Eung Kim","doi":"10.18653/v1/2023.findings-emnlp.329","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-emnlp.329","url":null,"abstract":"Prompt tuning, in which prompts are optimized to adapt large-scale pre-trained language models to downstream tasks instead of fine-tuning the full model parameters, has been shown to be particularly effective when the prompts are trained in a multi-task transfer learning setting. These methods generally involve individually training prompts for each source task and then aggregating them to provide the initialization of the prompt for the target task. However, this approach critically ignores the fact that some of the source tasks could be negatively or positively interfering with each other. We argue that when we extract knowledge from source tasks via training source prompts, we need to consider this correlation among source tasks for better transfer to target tasks. To this end, we propose a Bayesian approach where we work with the posterior distribution of prompts across source tasks. We obtain representative source prompts corresponding to the samples from the posterior utilizing Stein Variational Gradient Descent, which are then aggregated to constitute the initial target prompt. We show extensive experimental results on the standard benchmark NLP tasks, where our Bayesian multi-task transfer learning approach outperforms the state-of-the-art methods in many settings. Furthermore, our approach requires no auxiliary models other than the prompt itself, achieving a high degree of parameter efficiency.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"28 6","pages":"4942-4958"},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139841096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning 用于软提示调整的贝叶斯多任务转移学习
Conference on Empirical Methods in Natural Language Processing Pub Date : 2024-02-13 DOI: 10.18653/v1/2023.findings-emnlp.329
Haeju Lee, Minchan Jeong, SeYoung Yun, Kee-Eung Kim
{"title":"Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning","authors":"Haeju Lee, Minchan Jeong, SeYoung Yun, Kee-Eung Kim","doi":"10.18653/v1/2023.findings-emnlp.329","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-emnlp.329","url":null,"abstract":"Prompt tuning, in which prompts are optimized to adapt large-scale pre-trained language models to downstream tasks instead of fine-tuning the full model parameters, has been shown to be particularly effective when the prompts are trained in a multi-task transfer learning setting. These methods generally involve individually training prompts for each source task and then aggregating them to provide the initialization of the prompt for the target task. However, this approach critically ignores the fact that some of the source tasks could be negatively or positively interfering with each other. We argue that when we extract knowledge from source tasks via training source prompts, we need to consider this correlation among source tasks for better transfer to target tasks. To this end, we propose a Bayesian approach where we work with the posterior distribution of prompts across source tasks. We obtain representative source prompts corresponding to the samples from the posterior utilizing Stein Variational Gradient Descent, which are then aggregated to constitute the initial target prompt. We show extensive experimental results on the standard benchmark NLP tasks, where our Bayesian multi-task transfer learning approach outperforms the state-of-the-art methods in many settings. Furthermore, our approach requires no auxiliary models other than the prompt itself, achieving a high degree of parameter efficiency.","PeriodicalId":505350,"journal":{"name":"Conference on Empirical Methods in Natural Language Processing","volume":"97 26","pages":"4942-4958"},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139781309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信