Computer Speech and Language最新文献

筛选
英文 中文
Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection 零点打击:测试用于抑郁检测的开箱即用 LLM 模型的泛化能力
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-11 DOI: 10.1016/j.csl.2024.101663
Julia Ohse , Bakir Hadžić , Parvez Mohammed , Nicolina Peperkorn , Michael Danner , Akihiro Yorita , Naoyuki Kubota , Matthias Rätsch , Youssef Shiban
{"title":"Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection","authors":"Julia Ohse ,&nbsp;Bakir Hadžić ,&nbsp;Parvez Mohammed ,&nbsp;Nicolina Peperkorn ,&nbsp;Michael Danner ,&nbsp;Akihiro Yorita ,&nbsp;Naoyuki Kubota ,&nbsp;Matthias Rätsch ,&nbsp;Youssef Shiban","doi":"10.1016/j.csl.2024.101663","DOIUrl":"10.1016/j.csl.2024.101663","url":null,"abstract":"<div><p>Depression is a significant global health challenge. Still, many people suffering from depression remain undiagnosed. Furthermore, the assessment of depression can be subject to human bias. Natural Language Processing (NLP) models offer a promising solution. We investigated the potential of four NLP models (BERT, Llama2-13B, GPT-3.5, and GPT-4) for depression detection in clinical interviews. Participants (N = 82) underwent clinical interviews and completed a self-report depression questionnaire. NLP models inferred depression scores from interview transcripts. Questionnaire cut-off values for depression were used as a classifier for depression. GPT-4 showed the highest accuracy for depression classification (F1 score 0.73), while zero-shot GPT-3.5 initially performed with low accuracy (0.34), improved to 0.82 after fine-tuning, and achieved 0.68 with clustered data. GPT-4 estimates of symptom severity PHQ-8 score correlated strongly (r = 0.71) with true symptom severity. These findings demonstrate the potential of AI models for depression detection. However, further research is necessary before widespread deployment can be considered.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101663"},"PeriodicalIF":4.3,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141043762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two in One: A multi-task framework for politeness turn identification and phrase extraction in goal-oriented conversations 二合一:目标导向会话中礼貌转向识别和短语提取的多任务框架
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-06 DOI: 10.1016/j.csl.2024.101661
Priyanshu Priya, Mauajama Firdaus, Asif Ekbal
{"title":"Two in One: A multi-task framework for politeness turn identification and phrase extraction in goal-oriented conversations","authors":"Priyanshu Priya,&nbsp;Mauajama Firdaus,&nbsp;Asif Ekbal","doi":"10.1016/j.csl.2024.101661","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101661","url":null,"abstract":"<div><p>Goal-oriented dialogue systems are becoming pervasive in human lives. To facilitate task completion and human participation in a practical setting, such systems must have extensive technical knowledge and social understanding. Politeness is a socially desirable trait that plays a crucial role in task-oriented conversations for ensuring better user engagement and satisfaction. To this end, we propose a novel task of politeness analysis in goal-oriented dialogues. Politeness analysis consists of two sub-tasks: politeness turn identification and phrase extraction. Politeness turn identification is dependent on textual triggers denoting politeness or impoliteness. In this regard, we propose a Bidirectional Encoder Representations from Transformers-Directional Graph Convolutional Network (BERT-DGCN) based multi-task learning approach that performs turn identification and phrase extraction tasks in a unified framework. Our proposed approach employs BERT for encoding input turns and DGCN for encoding syntactic information, in which dependency among words is incorporated into DGCN to improve its capability to represent input utterances and benefit politeness analysis task accordingly. Our proposed model classifies each turn of a conversation into one of the three pre-defined classes, <em>viz.</em> polite, impolite and neutral, and extracts phrases denoting politeness or impoliteness in that turn simultaneously. As there is no such readily available data, we prepare a conversational dataset, <strong><em>PoDial</em></strong> for mental health counseling and legal aid for crime victims in English for our experiment. Experimental results demonstrate that our proposed approach is effective and achieves 2.04 points improvement on turn identification accuracy and 2.40 points on phrase extraction F1- score on our dataset over baselines.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101661"},"PeriodicalIF":4.3,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140947812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cross-attention augmented model for event-triggered context-aware story generation 事件触发情境感知故事生成的交叉注意力增强模型
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-06 DOI: 10.1016/j.csl.2024.101662
Chen Tang , Tyler Loakman , Chenghua Lin
{"title":"A cross-attention augmented model for event-triggered context-aware story generation","authors":"Chen Tang ,&nbsp;Tyler Loakman ,&nbsp;Chenghua Lin","doi":"10.1016/j.csl.2024.101662","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101662","url":null,"abstract":"<div><p>Despite recent advancements, existing story generation systems continue to encounter difficulties in effectively incorporating contextual and event features, which greatly influence the quality of generated narratives. To tackle these challenges, we introduce a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by employing a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism enables our model to exploit logical relationships between events more effectively during the story generation process. To further enhance our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This allows EtriCA to adapt to a wider range of data samples. This results in approximately 5% improvement in automatic metrics and over 10% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art (SOTA) baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results underscore the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101662"},"PeriodicalIF":4.3,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000457/pdfft?md5=6b2981aa01c6fa0779df7e400f7d036a&pid=1-s2.0-S0885230824000457-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141077706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing subjectivity in paralinguistic data labeling for improved classification performance: A case study with Spanish-speaking Mexican children using data balancing and semi-supervised learning 解决副语言数据标注中的主观性,提高分类性能:利用数据平衡和半监督学习对讲西班牙语的墨西哥儿童进行案例研究
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-01 DOI: 10.1016/j.csl.2024.101652
Daniel Fajardo-Delgado , Isabel G. Vázquez-Gómez , Humberto Pérez-Espinosa
{"title":"Addressing subjectivity in paralinguistic data labeling for improved classification performance: A case study with Spanish-speaking Mexican children using data balancing and semi-supervised learning","authors":"Daniel Fajardo-Delgado ,&nbsp;Isabel G. Vázquez-Gómez ,&nbsp;Humberto Pérez-Espinosa","doi":"10.1016/j.csl.2024.101652","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101652","url":null,"abstract":"<div><p>Paralinguistics is an essential component of verbal communication, comprising elements that provide additional information to the language, such as emotional signals. However, the subjective nature of perceiving affective aspects, such as emotions, poses a significant challenge to the development of quality resources for training recognition models of paralinguistic features. Labelers may have different opinions and perceive different emotions from others, making it difficult to achieve a diverse and sufficient representation of considered categories. In this study, we focused on the automatic classification of paralinguistic aspects in Spanish-speaking Mexican children of elementary school age. However, the dataset presents a strong imbalance in all labeled aspects and a low agreement between the labelers. Furthermore, the audio samples were too short, making it challenging to accurately classify affective speech. To address these challenges, we propose a novel method that combines data balancing algorithms and semisupervised learning to improve the classification performance of the trained models. Our method aims to mitigate the subjectivity involved in labeling paralinguistic data, thus advancing the development of robust and accurate recognition models of affective aspects in speech.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101652"},"PeriodicalIF":4.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140879895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying machine learning to assess emotional reactions to video game content streamed on Spanish Twitch channels 应用机器学习评估对西班牙 Twitch 频道上视频游戏内容的情绪反应
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-04-25 DOI: 10.1016/j.csl.2024.101651
Noemí Merayo , Rosalía Cotelo , Rocío Carratalá-Sáez , Francisco J. Andújar
{"title":"Applying machine learning to assess emotional reactions to video game content streamed on Spanish Twitch channels","authors":"Noemí Merayo ,&nbsp;Rosalía Cotelo ,&nbsp;Rocío Carratalá-Sáez ,&nbsp;Francisco J. Andújar","doi":"10.1016/j.csl.2024.101651","DOIUrl":"10.1016/j.csl.2024.101651","url":null,"abstract":"<div><p>This research explores for the first time the application of machine learning to detect emotional responses in video game streaming channels, specifically on Twitch, the most widely used platform for broadcasting content. Analyzing sentiment in gaming contexts is difficult due to the brevity of messages, the lack of context, and the use of informal language, which is exacerbated in the gaming environment by slang, abbreviations, memes, and jargon. First, a novel Spanish corpus was created from chat messages on Spanish video game Twitch channels, manually labeled for polarity and emotions. It is noteworthy as the first Spanish corpus for analyzing social responses on Twitch. Secondly, machine learning algorithms were used to classify polarity and emotions offering promising evaluations. The methodology followed in this work consists of three main steps: (1) Extracting Twitch chat messages from Spanish streamers’ channels related to gaming events and gameplays; (2) Processing and selecting the messages to form the corpus and manually annotating polarity and emotions; and (3) Applying machine learning models to detect polarity and emotions in the created corpus. The results have shown that a Bidirectional Encoder Representation from Transformers (BERT) based model excels with 78% accuracy in polarity detection, while deep learning and Random Forest models reach around 70%. For emotion detection, the BERT model performs best with 68%, followed by deep learning with 55%. It is worth noting that emotion detection is more challenging due to the subjective interpretation of emotions in the complex communicative context of video gaming on platforms such as Twitch. The use of supervised learning techniques, together with the rigorous corpus labeling process and the subsequent corpus pre-processing methodology, has helped to mitigate these challenges, and the algorithms have performed well. The main limitations of the research involve category and video game representation balance. Finally, it is important to stress that the integration of machine learning in video games and on Twitch is innovative, by allowing the identification of viewers’ emotions on streamers’ channels. This innovation could bring benefits such as a better understanding of audience sentiment, improving content and audience retention, providing personalized recommendations and detecting toxic behavior in chats.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101651"},"PeriodicalIF":4.3,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000342/pdfft?md5=fa76bef8f1f9ae5572fb71d8165adda9&pid=1-s2.0-S0885230824000342-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140786978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IKDSumm: Incorporating key-phrases into BERT for extractive disaster tweet summarization IKDSumm:将关键字词纳入 BERT 以提取灾难推文摘要
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-04-16 DOI: 10.1016/j.csl.2024.101649
Piyush Kumar Garg , Roshni Chakraborty , Srishti Gupta , Sourav Kumar Dandapat
{"title":"IKDSumm: Incorporating key-phrases into BERT for extractive disaster tweet summarization","authors":"Piyush Kumar Garg ,&nbsp;Roshni Chakraborty ,&nbsp;Srishti Gupta ,&nbsp;Sourav Kumar Dandapat","doi":"10.1016/j.csl.2024.101649","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101649","url":null,"abstract":"<div><p>Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Humanitarian organizations, government agencies, and volunteers rely on a concise compilation of such information for effective disaster management. Existing methods to make such compilations are mostly generic summarization approaches that do not exploit domain knowledge. In this paper, we propose a disaster-specific tweet summarization framework, <em>IKDSumm</em>, which initially identifies the crucial and important information from each tweet related to a disaster through key-phrases of that tweet. We identify these key-phrases by utilizing the domain knowledge (using existing ontology) of disasters without any human intervention. Further, we utilize these key-phrases to automatically generate a summary of the tweets. Therefore, given tweets related to a disaster, <em>IKDSumm</em> ensures fulfillment of the summarization key objectives, such as information coverage, relevance, and diversity in summary without any human intervention. We evaluate the performance of <em>IKDSumm</em> with 8 state-of-the-art techniques on 12 disaster datasets. The evaluation results show that <em>IKDSumm</em> outperforms existing techniques by approximately <span><math><mrow><mn>2</mn><mo>−</mo><mn>79</mn><mtext>%</mtext></mrow></math></span> in terms of ROUGE-N F1-score.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101649"},"PeriodicalIF":4.3,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140605779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Yes, I am afraid of the sharks and also wild lions!: A multitask framework for enhancing dialogue generation via knowledge and emotion grounding 是的,我害怕鲨鱼,也害怕野生狮子!":通过知识和情感基础加强对话生成的多任务框架
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-04-16 DOI: 10.1016/j.csl.2024.101645
Deeksha Varshney, Asif Ekbal
{"title":"Yes, I am afraid of the sharks and also wild lions!: A multitask framework for enhancing dialogue generation via knowledge and emotion grounding","authors":"Deeksha Varshney,&nbsp;Asif Ekbal","doi":"10.1016/j.csl.2024.101645","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101645","url":null,"abstract":"<div><p>Current end-to-end neural conversation models inherently lack the capability to generate coherently engaging responses. Efforts to boost informativeness have an adversarial effect on emotional and factual accuracy, as validated by several sequence-based models. While these issues can be alleviated by access to emotion labels and background knowledge, there is no guarantee of relevance and informativeness in the generated responses. In real dialogue corpus, informative words like named entities, and words that carry specific emotions can often be infrequent and hard to model, and one primary challenge of the dialogue system is how to promote the model’s capability of generating high-quality responses with those informative words. Furthermore, earlier approaches depended on straightforward concatenation techniques that lacked robust representation capabilities in order to account for human emotions. To address this problem, we propose a novel multitask hierarchical encoder–decoder model, which can enhance the multi-turn dialogue response generation by incorporating external textual knowledge and relevant emotions. Experimental results on a benchmark dataset indicate that our model is superior over competitive baselines concerning both automatic and human evaluation.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101645"},"PeriodicalIF":4.3,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving cross-lingual low-resource speech recognition by Task-based Meta PolyLoss 通过基于任务的元多损失改进跨语言低资源语音识别
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-04-09 DOI: 10.1016/j.csl.2024.101648
Yaqi Chen , Hao Zhang , Xukui Yang , Wenlin Zhang , Dan Qu
{"title":"Improving cross-lingual low-resource speech recognition by Task-based Meta PolyLoss","authors":"Yaqi Chen ,&nbsp;Hao Zhang ,&nbsp;Xukui Yang ,&nbsp;Wenlin Zhang ,&nbsp;Dan Qu","doi":"10.1016/j.csl.2024.101648","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101648","url":null,"abstract":"<div><p>Multilingual meta learning has emerged as a promising paradigm for transferring knowledge from source languages to facilitate the learning of low-resource target languages. Loss functions are a type of meta-knowledge that is crucial to the effective training of neural networks. However, the misalignment between the loss functions and the learning paradigms of meta learning degrades the network’s performance. To address this challenge, we propose a new method called Task-based Meta PolyLoss (TMPL) for meta learning. By regarding speech recognition tasks as normal samples and applying PolyLoss to the meta loss function, TMPL can be denoted as a linear combination of polynomial functions based on task query loss. Theoretical analysis shows that TMPL improves meta learning by enabling attention adjustment across different tasks, which can be tailored for different datasets. Experiments on three datasets demonstrated that gradient-based meta learning methods achieve superior performance with TMPL. Furthermore, our experiments validate that the task-based loss function effectively mitigates the misalignment issue.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101648"},"PeriodicalIF":4.3,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140605778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SEBGM: Sentence Embedding Based on Generation Model with multi-task learning SEBGM:基于多任务学习的句子嵌入生成模型
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-04-06 DOI: 10.1016/j.csl.2024.101647
Qian Wang , Weiqi Zhang , Tianyi Lei , Yu Cao , Dezhong Peng , Xu Wang
{"title":"SEBGM: Sentence Embedding Based on Generation Model with multi-task learning","authors":"Qian Wang ,&nbsp;Weiqi Zhang ,&nbsp;Tianyi Lei ,&nbsp;Yu Cao ,&nbsp;Dezhong Peng ,&nbsp;Xu Wang","doi":"10.1016/j.csl.2024.101647","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101647","url":null,"abstract":"<div><p>Sentence embedding, which aims to learn an effective representation of a sentence, is a significant part for downstream tasks. Recently, using contrastive learning and pre-trained model, most methods of sentence embedding achieve encouraging results. However, on the one hand, these methods utilize discrete data augmentation to obtain positive samples performing contrastive learning, which could distort the original semantic of sentences. On the other hand, most methods directly employ the contrastive frameworks of computer vision to perform contrastive learning, which could confine the contrastive training due to the discrete and sparse text data compared with image data. To solve the issues above, we design a novel contrastive framework based on generation model with multi-task learning by supervised contrastive training on the dataset of natural language inference (NLI) to obtain meaningful sentence embedding (SEBGM). SEBGM makes use of multi-task learning to enhance the usage of word-level and sentence-level semantic information of samples. In this way, the positive samples of SEBGM are from NLI rather than data augmentation. Extensive experiments show that our proposed SEBGM can advance the state-of-the-art sentence embedding on the semantic textual similarity (STS) tasks by utilizing multi-task learning.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101647"},"PeriodicalIF":4.3,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140535342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A flexible BERT model enabling width- and depth-dynamic inference 可进行宽度和深度动态推断的灵活 BERT 模型
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-04-04 DOI: 10.1016/j.csl.2024.101646
Ting Hu, Christoph Meinel, Haojin Yang
{"title":"A flexible BERT model enabling width- and depth-dynamic inference","authors":"Ting Hu,&nbsp;Christoph Meinel,&nbsp;Haojin Yang","doi":"10.1016/j.csl.2024.101646","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101646","url":null,"abstract":"<div><p>Fine-tuning and inference on Large Language Models like BERT have become increasingly expensive regarding memory cost and computation resources. The recently proposed computation-flexible BERT models facilitate their deployment in varied computational environments. Training such flexible BERT models involves jointly optimizing multiple BERT subnets, which will unavoidably interfere with one another. Besides, the performance of large subnets is limited by the performance gap between the smallest subnet and the supernet, despite efforts to enhance the smaller subnets. In this regard, we propose layer-wise Neural grafting to boost BERT subnets, especially the larger ones. The proposed method improves the average performance of the subnets on six GLUE tasks and boosts the supernets on all GLUE tasks and the SQuAD data set. Based on the boosted subnets, we further build an inference framework enabling practical width- and depth-dynamic inference regarding different inputs by combining width-dynamic gating modules and early exit off-ramps in the depth dimension. Experimental results show that the proposed framework achieves a better dynamic inference range than other methods in terms of trading off performance and computational complexity on four GLUE tasks and SQuAD. In particular, our best-tradeoff inference result outperforms other fixed-size models with similar amount of computations. Compared to BERT-Base, the proposed inference framework yields a 1.3-point improvement in the average GLUE score and a 2.2-point increase in the F1 score on SQuAD, while reducing computations by around 45%.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101646"},"PeriodicalIF":4.3,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000299/pdfft?md5=bb08debe9a20bb5be9d04abbaca1b345&pid=1-s2.0-S0885230824000299-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140546290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信