{"title":"Disentangling Task Relations for Few-shot Text Classification via Self-Supervised Hierarchical Task Clustering","authors":"Juan Zha, Zheng Li, Ying Wei, Yu Zhang","doi":"10.48550/arXiv.2211.08588","DOIUrl":"https://doi.org/10.48550/arXiv.2211.08588","url":null,"abstract":"Few-Shot Text Classification (FSTC) imitates humans to learn a new text classifier efficiently with only few examples, by leveraging prior knowledge from historical tasks. However, most prior works assume that all the tasks are sampled from a single data source, which cannot adapt to real-world scenarios where tasks are heterogeneous and lie in different distributions. As such, existing methods may suffer from their globally knowledge-shared mechanisms to handle the task heterogeneity. On the other hand, inherent task relation are not explicitly captured, making task knowledge unorganized and hard to transfer to new tasks. Thus, we explore a new FSTC setting where tasks can come from a diverse range of data sources. To address the task heterogeneity, we propose a self-supervised hierarchical task clustering (SS-HTC) method. SS-HTC not only customizes cluster-specific knowledge by dynamically organizing heterogeneous tasks into different clusters in hierarchical levels but also disentangles underlying relations between tasks to improve the interpretability. Extensive experiments on five public FSTC benchmark datasets demonstrate the effectiveness of SS-HTC.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"26 1","pages":"5236-5247"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86138670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware Dialog Generation","authors":"Deeksha Varshney, Aizan Zafar, Niranshu Kumar Behra, Asif Ekbal","doi":"10.48550/arXiv.2212.06049","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06049","url":null,"abstract":"The development of conversational agents to interact with patients and deliver clinical advice has attracted the interest of many researchers, particularly in light of the COVID-19 pandemic. The training of an end-to-end neural based dialog system, on the other hand, is hampered by a lack of multi-turn medical dialog corpus. We make the very first attempt to release a high-quality multi-turn Medical Dialog dataset relating to Covid-19 disease named CDialog, with over 1K conversations collected from the online medical counselling websites. We annotate each utterance of the conversation with seven different categories of medical entities, including diseases, symptoms, medical tests, medical history, remedies, medications and other aspects as additional labels. Finally, we propose a novel neural medical dialog system based on the CDialog dataset to advance future research on developing automated medical dialog systems. We use pre-trained language models for dialogue generation, incorporating annotated medical entities, to generate a virtual doctor's response that addresses the patient's query. Experimental results show that the proposed dialog models perform comparably better when supplemented with entity information and hence can improve the response quality.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"73 1","pages":"11373-11385"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86378176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinran Zhao, Shikhar Murty, Christopher D. Manning
{"title":"On Measuring the Intrinsic Few-Shot Hardness of Datasets","authors":"Xinran Zhao, Shikhar Murty, Christopher D. Manning","doi":"10.48550/arXiv.2211.09113","DOIUrl":"https://doi.org/10.48550/arXiv.2211.09113","url":null,"abstract":"While advances in pre-training have led to dramatic improvements in few-shot learning of NLP tasks, there is limited understanding of what drives successful few-shot adaptation in datasets. In particular, given a new dataset and a pre-trained model, what properties of the dataset make it few-shot learnable, and are these properties independent of the specific adaptation techniques used? We consider an extensive set of recent few-shot learning methods and show that their performance across a large number of datasets is highly correlated, showing that few-shot hardness may be intrinsic to datasets, for a given pre-trained model. To estimate intrinsic few-shot hardness, we then propose a simple and lightweight metric called Spread that captures the intuition that few-shot learning is made possible by exploiting feature-space invariances between training and test samples. Our metric better accounts for few-shot hardness compared to existing notions of hardness and is ~8-100x faster to compute.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"34 1","pages":"3955-3963"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76572642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pei Zhou, Hyundong Justin Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, J. Pujara, Xiang Ren
{"title":"Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality","authors":"Pei Zhou, Hyundong Justin Cho, Pegah Jandaghi, Dong-Ho Lee, Bill Yuchen Lin, J. Pujara, Xiang Ren","doi":"10.48550/arXiv.2211.09267","DOIUrl":"https://doi.org/10.48550/arXiv.2211.09267","url":null,"abstract":"Human communication relies on common ground (CG), the mutual knowledge and beliefs shared by participants, to produce coherent and interesting conversations. In this paper, we demonstrate that current response generation (RG) models produce generic and dull responses in dialogues because they act reflexively, failing to explicitly model CG, both due to the lack of CG in training data and the standard RG training procedure. We introduce Reflect, a dataset that annotates dialogues with explicit CG (materialized as inferences approximating shared knowledge and beliefs) and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data is rated as high quality (sensible, specific, and interesting) and models trained using this data have even lower quality, while most Reflect responses are judged high quality. Next, we analyze whether CG can help models produce better quality responses by using Reflect CG to guide RG models. Surprisingly, we find that simply prompting GPT3 to “think” about CG generates 30% more quality responses, showing promising benefits to integrating CG into the RG process.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"86 1","pages":"10450-10468"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82071291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungwhan Moon, Satwik Kottur, A. Geramifard, B. Damavandi
{"title":"Navigating Connected Memories with a Task-oriented Dialog System","authors":"Seungwhan Moon, Satwik Kottur, A. Geramifard, B. Damavandi","doi":"10.48550/arXiv.2211.08462","DOIUrl":"https://doi.org/10.48550/arXiv.2211.08462","url":null,"abstract":"Recent years have seen an increasing trend in the volume of personal media captured by users, thanks to the advent of smartphones and smart glasses, resulting in large media collections. Despite conversation being an intuitive human-computer interface, current efforts focus mostly on single-shot natural language based media retrieval to aid users query their media and re-live their memories. This severely limits the search functionality as users can neither ask follow-up queries nor obtain information without first formulating a single-turn query.In this work, we propose dialogs for connected memories as a powerful tool to empower users to search their media collection through a multi-turn, interactive conversation. Towards this, we collect a new task-oriented dialog dataset COMET, which contains 11.5k user↔assistant dialogs (totalling 103k utterances), grounded in simulated personal memory graphs. We employ a resource-efficient, two-phase data collection pipeline that uses: (1) a novel multimodal dialog simulator that generates synthetic dialog flows grounded in memory graphs, and, (2) manual paraphrasing to obtain natural language utterances. We analyze COMET, formulate four main tasks to benchmark meaningful progress, and adopt state-of-the-art language models as strong baselines, in order to highlight the multimodal challenges captured by our dataset.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"14 1","pages":"2495-2507"},"PeriodicalIF":0.0,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81937811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPE: Symmetrical Prompt Enhancement for Fact Probing","authors":"Yiyuan Li, Tong Che, Yezhen Wang, Zhengbao Jiang, Caiming Xiong, Snigdha Chaturvedi","doi":"10.48550/arXiv.2211.07078","DOIUrl":"https://doi.org/10.48550/arXiv.2211.07078","url":null,"abstract":"Pretrained language models (PLMs) have been shown to accumulate factual knowledge during pretraining (Petroni et al. 2019). Recent works probe PLMs for the extent of this knowledge through prompts either in discrete or continuous forms. However, these methods do not consider symmetry of the task: object prediction and subject prediction. In this work, we propose Symmetrical Prompt Enhancement (SPE), a continuous prompt-based method for factual probing in PLMs that leverages the symmetry of the task by constructing symmetrical prompts for subject and object prediction. Our results on a popular factual probing dataset, LAMA, show significant improvement of SPE over previous probing methods.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"4 1","pages":"11689-11698"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84572304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal
{"title":"Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations","authors":"Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal","doi":"10.48550/arXiv.2211.07517","DOIUrl":"https://doi.org/10.48550/arXiv.2211.07517","url":null,"abstract":"Recent work on explainable NLP has shown that few-shot prompting can enable large pre-trained language models (LLMs) to generate grammatical and factual natural language explanations for data labels. In this work, we study the connection between explainability and sample hardness by investigating the following research question – “Are LLMs and humans equally good at explaining data labels for both easy and hard samples?” We answer this question by first collecting human-written explanations in the form of generalizable commonsense rules on the task of Winograd Schema Challenge (Winogrande dataset). We compare these explanations with those generated by GPT-3 while varying the hardness of the test samples as well as the in-context samples. We observe that (1) GPT-3 explanations are as grammatical as human explanations regardless of the hardness of the test samples, (2) for easy examples, GPT-3 generates highly supportive explanations but human explanations are more generalizable, and (3) for hard examples, human explanations are significantly better than GPT-3 explanations both in terms of label-supportiveness and generalizability judgements. We also find that hardness of the in-context examples impacts the quality of GPT-3 explanations. Finally, we show that the supportiveness and generalizability aspects of human explanations are also impacted by sample hardness, although by a much smaller margin than models.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"32 1","pages":"2121-2131"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86051842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Binbin Xie, Xiangpeng Wei, Baosong Yang, Huan Lin, Jun Xie, Xiao-Ling Wang, M. Zhang, Jinsong Su
{"title":"WR-One2Set: Towards Well-Calibrated Keyphrase Generation","authors":"Binbin Xie, Xiangpeng Wei, Baosong Yang, Huan Lin, Jun Xie, Xiao-Ling Wang, M. Zhang, Jinsong Su","doi":"10.48550/arXiv.2211.06862","DOIUrl":"https://doi.org/10.48550/arXiv.2211.06862","url":null,"abstract":"Keyphrase generation aims to automatically generate short phrases summarizing an input document. The recently emerged ONE2SET paradigm (Ye et al., 2021) generates keyphrases as a set and has achieved competitive performance. Nevertheless, we observe serious calibration errors outputted by ONE2SET, especially in the over-estimation of ∅ token (means “no corresponding keyphrase”). In this paper, we deeply analyze this limitation and identify two main reasons behind: 1) the parallel generation has to introduce excessive ∅ as padding tokens into training instances; and 2) the training mechanism assigning target to each slot is unstable and further aggravates the ∅ token over-estimation. To make the model well-calibrated, we propose WR-ONE2SET which extends ONE2SET with an adaptive instance-level cost Weighting strategy and a target Re-assignment mechanism. The former dynamically penalizes the over-estimated slots for different instances thus smoothing the uneven training distribution. The latter refines the original inappropriate assignment and reduces the supervisory signals of over-estimated slots. Experimental results on commonly-used datasets demonstrate the effectiveness and generality of our proposed paradigm.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"32 1","pages":"7283-7293"},"PeriodicalIF":0.0,"publicationDate":"2022-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78763857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPT: Improving Prompt Tuning Efficiency via Progressive Training","authors":"Yufei Huang, Yujia Qin, Huadong Wang, Yichun Yin, Maosong Sun, Zhiyuan Liu, Qun Liu","doi":"10.48550/arXiv.2211.06840","DOIUrl":"https://doi.org/10.48550/arXiv.2211.06840","url":null,"abstract":"Recently, prompt tuning (PT) has gained increasing attention as a parameter-efficient way of tuning pre-trained language models (PLMs). Despite extensively reducing the number of tunable parameters and achieving satisfying performance, PT is training-inefficient due to its slow convergence. To improve PT's training efficiency, we first make some novel observations about the prompt transferability of\"partial PLMs\", which are defined by compressing a PLM in depth or width. We observe that the soft prompts learned by different partial PLMs of various sizes are similar in the parameter space, implying that these soft prompts could potentially be transferred among partial PLMs. Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size. After each expansion, we recycle the previously learned soft prompts as initialization for the enlarged partial PLM and then proceed PT. We demonstrate the feasibility of FPT on 5 tasks and show that FPT could save over 30% training computations while achieving comparable performance.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"190 1","pages":"6877-6887"},"PeriodicalIF":0.0,"publicationDate":"2022-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83064579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Textual Adversaries with Minimal Perturbation","authors":"Xingyi Zhao, Lu Zhang, Depeng Xu, Shuhan Yuan","doi":"10.48550/arXiv.2211.06571","DOIUrl":"https://doi.org/10.48550/arXiv.2211.06571","url":null,"abstract":"Many word-level adversarial attack approaches for textual data have been proposed in recent studies. However, due to the massive search space consisting of combinations of candidate words, the existing approaches face the problem of preserving the semantics of texts when crafting adversarial counterparts. In this paper, we develop a novel attack strategy to find adversarial texts with high similarity to the original texts while introducing minimal perturbation. The rationale is that we expect the adversarial texts with small perturbation can better preserve the semantic meaning of original texts. Experiments show that, compared with state-of-the-art attack approaches, our approach achieves higher success rates and lower perturbation rates in four benchmark datasets.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"26 1","pages":"4599-4606"},"PeriodicalIF":0.0,"publicationDate":"2022-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83054508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}