{"title":"Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models","authors":"Bishwash Khanal, Jeffery M. Capone","doi":"arxiv-2409.11233","DOIUrl":"https://doi.org/arxiv-2409.11233","url":null,"abstract":"Large language models (LLMs) offer powerful capabilities but incur\u0000substantial computational costs, driving the need for efficient compression\u0000techniques. This study evaluates the impact of popular compression methods -\u0000Magnitude Pruning, SparseGPT, and Wanda - on the LLaMA-2-7B model, focusing on\u0000the trade-offs between model size reduction, downstream task performance, and\u0000the role of calibration data. Our findings reveal that while SparseGPT and\u0000Wanda preserve perplexity even at 50% sparsity, they suffer significant\u0000degradation on downstream tasks, highlighting the inadequacy of perplexity as\u0000the sole evaluation metric. To address this, we introduce Jensen-Shannon (JS)\u0000Divergence as a more comprehensive metric that captures nuanced changes in\u0000model behavior post-compression. We further demonstrate that task-specific\u0000calibration data significantly enhances the downstream performance of\u0000compressed models compared to general calibration data. This research\u0000underscores the necessity for diverse evaluation metrics and careful\u0000calibration data selection to fully understand the complexities of LLM\u0000compression and its implications for practical applications.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Liu, Liming Zhan, Yujie Feng, Zexin Lu, Chengqiang Xie, Lei Xue, Xiao-Ming Wu, Albert Y. S. Lam
{"title":"Diversity-grounded Channel Prototypical Learning for Out-of-Distribution Intent Detection","authors":"Bo Liu, Liming Zhan, Yujie Feng, Zexin Lu, Chengqiang Xie, Lei Xue, Xiao-Ming Wu, Albert Y. S. Lam","doi":"arxiv-2409.11114","DOIUrl":"https://doi.org/arxiv-2409.11114","url":null,"abstract":"In the realm of task-oriented dialogue systems, a robust intent detection\u0000mechanism must effectively handle malformed utterances encountered in\u0000real-world scenarios. This study presents a novel fine-tuning framework for\u0000large language models (LLMs) aimed at enhancing in-distribution (ID) intent\u0000classification and out-of-distribution (OOD) intent detection, which utilizes\u0000semantic matching with prototypes derived from ID class names. By harnessing\u0000the highly distinguishable representations of LLMs, we construct semantic\u0000prototypes for each ID class using a diversity-grounded prompt tuning approach.\u0000We rigorously test our framework in a challenging OOD context, where ID and OOD\u0000classes are semantically close yet distinct, referred to as emph{near} OOD\u0000detection. For a thorough assessment, we benchmark our method against the\u0000prevalent fine-tuning approaches. The experimental findings reveal that our\u0000method demonstrates superior performance in both few-shot ID intent\u0000classification and near-OOD intent detection tasks.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikit Srivastava, Denis Kuchelev, Tatiana Moteu, Kshitij Shetty, Michael Roeder, Diego Moussallem, Hamada Zahera, Axel-Cyrille Ngonga Ngomo
{"title":"LOLA -- An Open-Source Massively Multilingual Large Language Model","authors":"Nikit Srivastava, Denis Kuchelev, Tatiana Moteu, Kshitij Shetty, Michael Roeder, Diego Moussallem, Hamada Zahera, Axel-Cyrille Ngonga Ngomo","doi":"arxiv-2409.11272","DOIUrl":"https://doi.org/arxiv-2409.11272","url":null,"abstract":"This paper presents LOLA, a massively multilingual large language model\u0000trained on more than 160 languages using a sparse Mixture-of-Experts\u0000Transformer architecture. Our architectural and implementation choices address\u0000the challenge of harnessing linguistic diversity while maintaining efficiency\u0000and avoiding the common pitfalls of multilinguality. Our analysis of the\u0000evaluation results shows competitive performance in natural language generation\u0000and understanding tasks. Additionally, we demonstrate how the learned\u0000expert-routing mechanism exploits implicit phylogenetic linguistic patterns to\u0000potentially alleviate the curse of multilinguality. We provide an in-depth look\u0000at the training process, an analysis of the datasets, and a balanced\u0000exploration of the model's strengths and limitations. As an open-source model,\u0000LOLA promotes reproducibility and serves as a robust foundation for future\u0000research. Our findings enable the development of compute-efficient multilingual\u0000models with strong, scalable performance across languages.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Basel Mousi, Nadir Durrani, Fatema Ahmad, Md. Arid Hasan, Maram Hasanain, Tameem Kabbani, Fahim Dalvi, Shammur Absar Chowdhury, Firoj Alam
{"title":"AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs","authors":"Basel Mousi, Nadir Durrani, Fatema Ahmad, Md. Arid Hasan, Maram Hasanain, Tameem Kabbani, Fahim Dalvi, Shammur Absar Chowdhury, Firoj Alam","doi":"arxiv-2409.11404","DOIUrl":"https://doi.org/arxiv-2409.11404","url":null,"abstract":"Arabic, with its rich diversity of dialects, remains significantly\u0000underrepresented in Large Language Models, particularly in dialectal\u0000variations. We address this gap by introducing seven synthetic datasets in\u0000dialects alongside Modern Standard Arabic (MSA), created using Machine\u0000Translation (MT) combined with human post-editing. We present AraDiCE, a\u0000benchmark for Arabic Dialect and Cultural Evaluation. We evaluate LLMs on\u0000dialect comprehension and generation, focusing specifically on low-resource\u0000Arabic dialects. Additionally, we introduce the first-ever fine-grained\u0000benchmark designed to evaluate cultural awareness across the Gulf, Egypt, and\u0000Levant regions, providing a novel dimension to LLM evaluation. Our findings\u0000demonstrate that while Arabic-specific models like Jais and AceGPT outperform\u0000multilingual models on dialectal tasks, significant challenges persist in\u0000dialect identification, generation, and translation. This work contributes ~45K\u0000post-edited samples, a cultural benchmark, and highlights the importance of\u0000tailored training to improve LLM performance in capturing the nuances of\u0000diverse Arabic dialects and cultural contexts. We will release the dialectal\u0000translation models and benchmarks curated in this study.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Priyesh Vakharia, Abigail Kufeldt, Max Meyers, Ian Lane, Leilani Gilpin
{"title":"ProSLM : A Prolog Synergized Language Model for explainable Domain Specific Knowledge Based Question Answering","authors":"Priyesh Vakharia, Abigail Kufeldt, Max Meyers, Ian Lane, Leilani Gilpin","doi":"arxiv-2409.11589","DOIUrl":"https://doi.org/arxiv-2409.11589","url":null,"abstract":"Neurosymbolic approaches can add robustness to opaque neural systems by\u0000incorporating explainable symbolic representations. However, previous\u0000approaches have not used formal logic to contextualize queries to and validate\u0000outputs of large language models (LLMs). We propose systemname{}, a novel\u0000neurosymbolic framework, to improve the robustness and reliability of LLMs in\u0000question-answering tasks. We provide systemname{} with a domain-specific\u0000knowledge base, a logical reasoning system, and an integration to an existing\u0000LLM. This framework has two capabilities (1) context gathering: generating\u0000explainable and relevant context for a given query, and (2) validation:\u0000confirming and validating the factual accuracy of a statement in accordance\u0000with a knowledge base (KB). Our work opens a new area of neurosymbolic\u0000generative AI text validation and user personalization.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingling Xu, Haoran Xie, S. Joe Qin, Fu Lee Wang, Xiaohui Tao
{"title":"Exploring ChatGPT-based Augmentation Strategies for Contrastive Aspect-based Sentiment Analysis","authors":"Lingling Xu, Haoran Xie, S. Joe Qin, Fu Lee Wang, Xiaohui Tao","doi":"arxiv-2409.11218","DOIUrl":"https://doi.org/arxiv-2409.11218","url":null,"abstract":"Aspect-based sentiment analysis (ABSA) involves identifying sentiment towards\u0000specific aspect terms in a sentence and allows us to uncover nuanced\u0000perspectives and attitudes on particular aspects of a product, service, or\u0000topic. However, the scarcity of labeled data poses a significant challenge to\u0000training high-quality models. To address this issue, we explore the potential\u0000of data augmentation using ChatGPT, a well-performing large language model\u0000(LLM), to enhance the sentiment classification performance towards aspect\u0000terms. Specifically, we explore three data augmentation strategies based on\u0000ChatGPT: context-focused, aspect-focused, and context-aspect data augmentation\u0000techniques. Context-focused data augmentation focuses on changing the word\u0000expression of context words in the sentence while keeping aspect terms\u0000unchanged. In contrast, aspect-focused data augmentation aims to change aspect\u0000terms but keep context words unchanged. Context-Aspect data augmentation\u0000integrates the above two data augmentations to generate augmented samples.\u0000Furthermore, we incorporate contrastive learning into the ABSA tasks to improve\u0000performance. Extensive experiments show that all three data augmentation\u0000techniques lead to performance improvements, with the context-aspect data\u0000augmentation strategy performing best and surpassing the performance of the\u0000baseline models.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling","authors":"Xinyue Fang, Zhen Huang, Zhiliang Tian, Minghui Fang, Ziyi Pan, Quntian Fang, Zhihua Wen, Hengyue Pan, Dongsheng Li","doi":"arxiv-2409.11283","DOIUrl":"https://doi.org/arxiv-2409.11283","url":null,"abstract":"LLMs obtain remarkable performance but suffer from hallucinations. Most\u0000research on detecting hallucination focuses on the questions with short and\u0000concrete correct answers that are easy to check the faithfulness. Hallucination\u0000detections for text generation with open-ended answers are more challenging.\u0000Some researchers use external knowledge to detect hallucinations in generated\u0000texts, but external resources for specific scenarios are hard to access. Recent\u0000studies on detecting hallucinations in long text without external resources\u0000conduct consistency comparison among multiple sampled outputs. To handle long\u0000texts, researchers split long texts into multiple facts and individually\u0000compare the consistency of each pairs of facts. However, these methods (1)\u0000hardly achieve alignment among multiple facts; (2) overlook dependencies\u0000between multiple contextual facts. In this paper, we propose a graph-based\u0000context-aware (GCA) hallucination detection for text generations, which aligns\u0000knowledge facts and considers the dependencies between contextual knowledge\u0000triples in consistency comparison. Particularly, to align multiple facts, we\u0000conduct a triple-oriented response segmentation to extract multiple knowledge\u0000triples. To model dependencies among contextual knowledge triple (facts), we\u0000construct contextual triple into a graph and enhance triples' interactions via\u0000message passing and aggregating via RGCN. To avoid the omission of knowledge\u0000triples in long text, we conduct a LLM-based reverse verification via\u0000reconstructing the knowledge triples. Experiments show that our model enhances\u0000hallucination detection and excels all baselines.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg
{"title":"Chain-of-Thought Prompting for Speech Translation","authors":"Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg","doi":"arxiv-2409.11538","DOIUrl":"https://doi.org/arxiv-2409.11538","url":null,"abstract":"Large language models (LLMs) have demonstrated remarkable advancements in\u0000language understanding and generation. Building on the success of text-based\u0000LLMs, recent research has adapted these models to use speech embeddings for\u0000prompting, resulting in Speech-LLM models that exhibit strong performance in\u0000automatic speech recognition (ASR) and automatic speech translation (AST). In\u0000this work, we propose a novel approach to leverage ASR transcripts as prompts\u0000for AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM\u0000model consists of a speech encoder and an encoder-decoder structure\u0000Megatron-T5. By first decoding speech to generate ASR transcripts and\u0000subsequently using these transcripts along with encoded speech for prompting,\u0000we guide the speech translation in a two-step process like chain-of-thought\u0000(CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model\u0000adaptation and shows superior performance to full model fine-tuning.\u0000Experimental results show that the proposed CoT prompting significantly\u0000improves AST performance, achieving an average increase of 2.4 BLEU points\u0000across 6 En->X or X->En AST tasks compared to speech prompting alone.\u0000Additionally, compared to a related CoT prediction method that predicts a\u0000concatenated sequence of ASR and AST transcripts, our method performs better by\u0000an average of 2 BLEU points.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semformer: Transformer Language Models with Semantic Planning","authors":"Yongjing Yin, Junran Ding, Kai Song, Yue Zhang","doi":"arxiv-2409.11143","DOIUrl":"https://doi.org/arxiv-2409.11143","url":null,"abstract":"Next-token prediction serves as the dominant component in current neural\u0000language models. During the training phase, the model employs teacher forcing,\u0000which predicts tokens based on all preceding ground truth tokens. However, this\u0000approach has been found to create shortcuts, utilizing the revealed prefix to\u0000spuriously fit future tokens, potentially compromising the accuracy of the\u0000next-token predictor. In this paper, we introduce Semformer, a novel method of\u0000training a Transformer language model that explicitly models the semantic\u0000planning of response. Specifically, we incorporate a sequence of planning\u0000tokens into the prefix, guiding the planning token representations to predict\u0000the latent semantic representations of the response, which are induced by an\u0000autoencoder. In a minimal planning task (i.e., graph path-finding), our model\u0000exhibits near-perfect performance and effectively mitigates shortcut learning,\u0000a feat that standard training methods and baseline models have been unable to\u0000accomplish. Furthermore, we pretrain Semformer from scratch with 125M\u0000parameters, demonstrating its efficacy through measures of perplexity,\u0000in-context learning, and fine-tuning on summarization tasks.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Efficiency of Visually Augmented Language Models","authors":"Paula Ontalvilla, Aitor Ormazabal, Gorka Azkune","doi":"arxiv-2409.11148","DOIUrl":"https://doi.org/arxiv-2409.11148","url":null,"abstract":"Despite the impressive performance of autoregressive Language Models (LM) it\u0000has been shown that due to reporting bias, LMs lack visual knowledge, i.e. they\u0000do not know much about the visual world and its properties. To augment LMs with\u0000visual knowledge, existing solutions often rely on explicit images, requiring\u0000time-consuming retrieval or image generation systems. This paper shows that\u0000explicit images are not necessary to visually augment an LM. Instead, we use\u0000visually-grounded text representations obtained from the well-known CLIP\u0000multimodal system. For a fair comparison, we modify VALM, a visually-augmented\u0000LM which uses image retrieval and representation, to work directly with\u0000visually-grounded text representations. We name this new model BLIND-VALM. We\u0000show that BLIND-VALM performs on par with VALM for Visual Language\u0000Understanding (VLU), Natural Language Understanding (NLU) and Language Modeling\u0000tasks, despite being significantly more efficient and simpler. We also show\u0000that scaling up our model within the compute budget of VALM, either increasing\u0000the model or pre-training corpus size, we outperform VALM for all the\u0000evaluation tasks.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}