Computer Speech and Language最新文献

筛选
英文 中文
TadaStride: Using time adaptive strides in audio data for effective downsampling TadaStride:在音频数据中使用时间自适应步长,实现有效降采样
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-06-10 DOI: 10.1016/j.csl.2024.101678
Yoonhyung Lee , Kyomin Jung
{"title":"TadaStride: Using time adaptive strides in audio data for effective downsampling","authors":"Yoonhyung Lee ,&nbsp;Kyomin Jung","doi":"10.1016/j.csl.2024.101678","DOIUrl":"10.1016/j.csl.2024.101678","url":null,"abstract":"<div><p>In this paper, we introduce a new downsampling method for audio data called TadaStride, which can adaptively adjust the downsampling ratios across an audio data instance. Unlike previous methods using a fixed downsampling ratio, TadaStride can preserve more information from task-relevant parts of a data instance by using smaller strides for those parts and larger strides for less relevant parts. Additionally, we also introduce TadaStride-F, which is developed as a more efficient version of TadaStride while maintaining minimal performance loss. In experiments, we evaluate our TadaStride, primarily focusing on a range of audio processing tasks. Firstly, in audio classification experiments, TadaStride and TadaStride-F outperform other widely used standard downsampling methods, even with comparable memory and time usage. Furthermore, through various analyses, we provide an understanding of how TadaStride learns effective adaptive strides and how it leads to improved performance. In addition, through additional experiments on automatic speech recognition and discrete speech representation learning, we demonstrate that TadaStride and TadaStride-F consistently outperform other downsampling methods and examine how the adaptive strides are learned in these tasks.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101678"},"PeriodicalIF":3.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000615/pdfft?md5=5861e2f1cdebf31ffd61d0cba92056f3&pid=1-s2.0-S0885230824000615-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141412883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments 混响和混响噪声环境中基于 DNN 的语音增强系统研究
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-06-06 DOI: 10.1016/j.csl.2024.101677
Heming Wang , Ashutosh Pandey , DeLiang Wang
{"title":"A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments","authors":"Heming Wang ,&nbsp;Ashutosh Pandey ,&nbsp;DeLiang Wang","doi":"10.1016/j.csl.2024.101677","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101677","url":null,"abstract":"<div><p>Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time-domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely-connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant-only and reverberant-noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time-domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101677"},"PeriodicalIF":4.3,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000603/pdfft?md5=6f57ae0077f304562bdf74000559d71d&pid=1-s2.0-S0885230824000603-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141325435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MPSA-DenseNet: A novel deep learning model for English accent classification MPSA-DenseNet:用于英语口音分类的新型深度学习模型
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-30 DOI: 10.1016/j.csl.2024.101676
Tianyu Song , Linh Thi Hoai Nguyen , Ton Viet Ta
{"title":"MPSA-DenseNet: A novel deep learning model for English accent classification","authors":"Tianyu Song ,&nbsp;Linh Thi Hoai Nguyen ,&nbsp;Ton Viet Ta","doi":"10.1016/j.csl.2024.101676","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101676","url":null,"abstract":"<div><p>This paper presents three innovative deep learning models for English accent classification: Multi-task Pyramid Split Attention- Densely Convolutional Networks (MPSA-DenseNet), Pyramid Split Attention- Densely Convolutional Networks (PSA-DenseNet), and Multi-task- Densely Convolutional Networks (Multi-DenseNet), that combine multi-task learning and/or the PSA module attention mechanism with DenseNet. We applied these models to data collected from five dialects of English across native English-speaking regions (England, the United States) and nonnative English-speaking regions (Hong Kong, Germany, India). Our experimental results show a significant improvement in classification accuracy, particularly with MPSA-DenseNet, which outperforms all other models, including Densely Convolutional Networks (DenseNet) and Efficient Pyramid Squeeze Attention (EPSA) models previously used for accent identification. Our findings indicate that MPSA-DenseNet is a highly promising model for accurately identifying English accents.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101676"},"PeriodicalIF":4.3,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000597/pdfft?md5=45eac4ef8fe33cc3af54ca5ce1756899&pid=1-s2.0-S0885230824000597-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141264076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel and secured email classification using deep neural network with bidirectional long short-term memory 利用双向长短期记忆的深度神经网络实现新颖安全的电子邮件分类
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-27 DOI: 10.1016/j.csl.2024.101667
A. Poobalan , K. Ganapriya , K. Kalaivani , K. Parthiban
{"title":"A novel and secured email classification using deep neural network with bidirectional long short-term memory","authors":"A. Poobalan ,&nbsp;K. Ganapriya ,&nbsp;K. Kalaivani ,&nbsp;K. Parthiban","doi":"10.1016/j.csl.2024.101667","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101667","url":null,"abstract":"<div><p>Email data has some characteristics that are different from other social media data, such as a large range of answers, formal language, notable length variations, high degrees of anomalies, and indirect relationships. The main goal in this research is to develop a robust and computationally efficient classifier that can distinguish between spam and regular email content. The benchmark Enron dataset, which is accessible to the public, was used for the tests. The six distinct Enron data sets we acquired were combined to generate the final seven Enron data sets. The dataset undergoes early preprocessing to remove superfluous sentences. The proposed model Bidirectional Long Short-Term Memory (BiLSTM) apply spam labels and to examine email documents for spam. On seven Enron datasets, DNN-BiLSTM performs better than other classifiers in the performance comparison in terms of accuracy. DNN-BiLSTM and convolutional neural networks demonstrated that they can classify spam with 96.39 % and 98.69 % accuracy, respectively, in comparison to other machine learning classifiers. The risks associated with cloud data management and potential security flaws are also covered in the paper. This research presents hybrid encryption as a means of protecting cloud data while preserving privacy by using the hybrid AES-Rabit encryption algorithm which is based on symmetric session key exchange.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101667"},"PeriodicalIF":4.3,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000500/pdfft?md5=93a3ab04f63a63c4343031dc3b1f9eca&pid=1-s2.0-S0885230824000500-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141250220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech emotion recognition in real static and dynamic human-robot interaction scenarios 真实静态和动态人机交互场景中的语音情感识别
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-22 DOI: 10.1016/j.csl.2024.101666
Nicolás Grágeda , Carlos Busso , Eduardo Alvarado , Ricardo García , Rodrigo Mahu , Fernando Huenupan , Néstor Becerra Yoma
{"title":"Speech emotion recognition in real static and dynamic human-robot interaction scenarios","authors":"Nicolás Grágeda ,&nbsp;Carlos Busso ,&nbsp;Eduardo Alvarado ,&nbsp;Ricardo García ,&nbsp;Rodrigo Mahu ,&nbsp;Fernando Huenupan ,&nbsp;Néstor Becerra Yoma","doi":"10.1016/j.csl.2024.101666","DOIUrl":"10.1016/j.csl.2024.101666","url":null,"abstract":"<div><p>The use of speech-based solutions is an appealing alternative to communicate in human-robot interaction (HRI). An important challenge in this area is processing distant speech which is often noisy, and affected by reverberation and time-varying acoustic channels. It is important to investigate effective speech solutions, especially in dynamic environments where the robots and the users move, changing the distance and orientation between a speaker and the microphone. This paper addresses this problem in the context of speech emotion recognition (SER), which is an important task to understand the intention of the message and the underlying mental state of the user. We propose a novel setup with a PR2 robot that moves as target speech and ambient noise are simultaneously recorded. Our study not only analyzes the detrimental effect of distance speech in this dynamic robot-user setting for speech emotion recognition but also provides solutions to attenuate its effect. We evaluate the use of two beamforming schemes to spatially filter the speech signal using either delay-and-sum (D&amp;S) or minimum variance distortionless response (MVDR). We consider the original training speech recorded in controlled situations, and simulated conditions where the training utterances are processed to simulate the target acoustic environment. We consider the case where the robot is moving (dynamic case) and not moving (static case). For speech emotion recognition, we explore two state-of-the-art classifiers using hand-crafted features implemented with the ladder network strategy and learned features implemented with the wav2vec 2.0 feature representation. MVDR led to a signal-to-noise ratio higher than the basic D&amp;S method. However, both approaches provided very similar average concordance correlation coefficient (CCC) improvements equal to 116 % with the HRI subsets using the ladder network trained with the original MSP-Podcast training utterances. For the wav2vec 2.0-based model, only D&amp;S led to improvements. Surprisingly, the static and dynamic HRI testing subsets resulted in a similar average concordance correlation coefficient. Finally, simulating the acoustic environment in the training dataset provided the highest average concordance correlation coefficient scores with the HRI subsets that are just 29 % and 22 % lower than those obtained with the original training/testing utterances, with ladder network and wav2vec 2.0, respectively.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101666"},"PeriodicalIF":4.3,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000494/pdfft?md5=10d8a0faec641adaf8be74271eaf5174&pid=1-s2.0-S0885230824000494-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141134350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An exploratory characterization of speech- and fine-motor coordination in verbal children with Autism spectrum disorder 自闭症谱系障碍言语儿童言语和精细动作协调性的特征探索
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-22 DOI: 10.1016/j.csl.2024.101665
Tanya Talkar , James R. Williamson , Sophia Yuditskaya , Daniel J. Hannon , Hrishikesh M. Rao , Lisa Nowinski , Hannah Saro , Maria Mody , Christopher J. McDougle , Thomas F. Quatieri
{"title":"An exploratory characterization of speech- and fine-motor coordination in verbal children with Autism spectrum disorder","authors":"Tanya Talkar ,&nbsp;James R. Williamson ,&nbsp;Sophia Yuditskaya ,&nbsp;Daniel J. Hannon ,&nbsp;Hrishikesh M. Rao ,&nbsp;Lisa Nowinski ,&nbsp;Hannah Saro ,&nbsp;Maria Mody ,&nbsp;Christopher J. McDougle ,&nbsp;Thomas F. Quatieri","doi":"10.1016/j.csl.2024.101665","DOIUrl":"10.1016/j.csl.2024.101665","url":null,"abstract":"<div><p>Autism spectrum disorder (ASD) is a neurodevelopmental disorder often associated with difficulties in speech production and fine-motor tasks. Thus, there is a need to develop objective measures to assess and understand speech production and other fine-motor challenges in individuals with ASD. In addition, recent research suggests that difficulties with speech production and fine-motor tasks may contribute to language difficulties in ASD. In this paper, we explore the utility of an off-body recording platform, from which we administer a speech- and fine-motor protocol to verbal children with ASD and neurotypical controls. We utilize a correlation-based analysis technique to develop proxy measures of motor coordination from signals derived from recordings of speech- and fine-motor behaviors. Eigenvalues of the resulting correlation matrix are inputs to Gaussian Mixture Models to discriminate between highly-verbal children with ASD and neurotypical controls. These eigenvalues also characterize the complexity (underlying dimensionality) of representative signals of speech- and fine-motor movement dynamics, and form the feature basis to estimate scores on an expressive vocabulary measure. Based on a pilot dataset (15 ASD and 15 controls), features derived from an oral story reading task are used in discriminating between the two groups with AUCs &gt; 0.80, and highlight lower complexity of coordination in children with ASD. Features derived from handwriting and maze tracing tasks led to AUCs of 0.86 and 0.91, however features derived from ocular tasks did not aid in discrimination between the ASD and neurotypical groups. In addition, features derived from free speech and sustained vowel tasks are strongly correlated with expressive vocabulary scores. These results indicate the promise of a correlation-based analysis in elucidating motor differences between individuals with ASD and neurotypical controls.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101665"},"PeriodicalIF":4.3,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000482/pdfft?md5=6554015220341426a1f33615cd53fd75&pid=1-s2.0-S0885230824000482-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141139333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A potential relation trigger method for entity-relation quintuple extraction in text with excessive entities 在实体过多的文本中提取实体关系五重奏的潜在关系触发器方法
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-15 DOI: 10.1016/j.csl.2024.101650
Xiaojun Xia , Yujiang Liu , Lijun Fu
{"title":"A potential relation trigger method for entity-relation quintuple extraction in text with excessive entities","authors":"Xiaojun Xia ,&nbsp;Yujiang Liu ,&nbsp;Lijun Fu","doi":"10.1016/j.csl.2024.101650","DOIUrl":"10.1016/j.csl.2024.101650","url":null,"abstract":"<div><p>In the task of joint entity and relation extraction, the relationship between two entities is determined by some specific words in their source text. These words are viewed as potential triggers which are the evidence to explain the relationship but not marked clearly. However, the current models cannot make good use of the potential words to optimize components of entities and relations, but can only give separate results. These models aim to identify the type of relation between two entities mentioned in the source text by encoding the text and entities. Although some models can generate the weights for every single word by improving the attention mechanism, the weights will be influenced by the irrelevant words essentially, which is not needed in enhancing the influence of the triggers. We propose a joint entity-relation quintuple extraction framework based on the Potential Relation Trigger (PRT) method to get the highest probability of a word as the prompt in every time step and join the words together as relation hints. In specific, we leverage polarization mechanism in possibility calculation to avoid nondifferentiable points of the functions in our method when choosing. We find that their representation will improve the performance of the relation part with the exact range of the entities. Extensive experiments results demonstrate that the effectiveness of our proposed model achieves state-of-the-art performance on four RE benchmark datasets.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101650"},"PeriodicalIF":4.3,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000330/pdfft?md5=394e6ed4d34c985c0397218c2f0043ed&pid=1-s2.0-S0885230824000330-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141047279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Room impulse response reshaping-based expectation–maximization in an underdetermined reverberant environment 欠确定混响环境中基于期望最大化的室内脉冲响应重塑
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-14 DOI: 10.1016/j.csl.2024.101664
Yuan Xie , Tao Zou , Junjie Yang , Weijun Sun , Shengli Xie
{"title":"Room impulse response reshaping-based expectation–maximization in an underdetermined reverberant environment","authors":"Yuan Xie ,&nbsp;Tao Zou ,&nbsp;Junjie Yang ,&nbsp;Weijun Sun ,&nbsp;Shengli Xie","doi":"10.1016/j.csl.2024.101664","DOIUrl":"10.1016/j.csl.2024.101664","url":null,"abstract":"<div><p>Source separation in an underdetermined reverberation environment is a very challenging issue. The classical method is based on the expectation–maximization algorithm. However, it is limited to high reverberation environments, resulting in bad or even invalid separation performance. To eliminate this restriction, a room impulse response reshaping-based expectation–maximization method is designed to solve the problem of source separation in an underdetermined reverberant environment. Firstly, a room impulse response reshaping technology is designed to eliminate the influence of audible echo on the reverberant environment, improving the quality of the received signals. Then, a new mathematical model of time-frequency mixing signals is established to reduce the approximation error of model transformation caused by high reverberation. Furthermore, an improved expectation–maximization method is proposed for real-time update learning rules of model parameters, and then the sources are separated using the estimators provided by the improved expectation–maximization method. Experimental results based on source separation of speech and music mixtures demonstrate that the proposed algorithm achieves better separation performance while maintaining much better robustness than popular expectation–maximization methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101664"},"PeriodicalIF":4.3,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141047534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection 零点打击:测试用于抑郁检测的开箱即用 LLM 模型的泛化能力
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-11 DOI: 10.1016/j.csl.2024.101663
Julia Ohse , Bakir Hadžić , Parvez Mohammed , Nicolina Peperkorn , Michael Danner , Akihiro Yorita , Naoyuki Kubota , Matthias Rätsch , Youssef Shiban
{"title":"Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection","authors":"Julia Ohse ,&nbsp;Bakir Hadžić ,&nbsp;Parvez Mohammed ,&nbsp;Nicolina Peperkorn ,&nbsp;Michael Danner ,&nbsp;Akihiro Yorita ,&nbsp;Naoyuki Kubota ,&nbsp;Matthias Rätsch ,&nbsp;Youssef Shiban","doi":"10.1016/j.csl.2024.101663","DOIUrl":"10.1016/j.csl.2024.101663","url":null,"abstract":"<div><p>Depression is a significant global health challenge. Still, many people suffering from depression remain undiagnosed. Furthermore, the assessment of depression can be subject to human bias. Natural Language Processing (NLP) models offer a promising solution. We investigated the potential of four NLP models (BERT, Llama2-13B, GPT-3.5, and GPT-4) for depression detection in clinical interviews. Participants (N = 82) underwent clinical interviews and completed a self-report depression questionnaire. NLP models inferred depression scores from interview transcripts. Questionnaire cut-off values for depression were used as a classifier for depression. GPT-4 showed the highest accuracy for depression classification (F1 score 0.73), while zero-shot GPT-3.5 initially performed with low accuracy (0.34), improved to 0.82 after fine-tuning, and achieved 0.68 with clustered data. GPT-4 estimates of symptom severity PHQ-8 score correlated strongly (r = 0.71) with true symptom severity. These findings demonstrate the potential of AI models for depression detection. However, further research is necessary before widespread deployment can be considered.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101663"},"PeriodicalIF":4.3,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141043762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two in One: A multi-task framework for politeness turn identification and phrase extraction in goal-oriented conversations 二合一:目标导向会话中礼貌转向识别和短语提取的多任务框架
IF 4.3 3区 计算机科学
Computer Speech and Language Pub Date : 2024-05-06 DOI: 10.1016/j.csl.2024.101661
Priyanshu Priya, Mauajama Firdaus, Asif Ekbal
{"title":"Two in One: A multi-task framework for politeness turn identification and phrase extraction in goal-oriented conversations","authors":"Priyanshu Priya,&nbsp;Mauajama Firdaus,&nbsp;Asif Ekbal","doi":"10.1016/j.csl.2024.101661","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101661","url":null,"abstract":"<div><p>Goal-oriented dialogue systems are becoming pervasive in human lives. To facilitate task completion and human participation in a practical setting, such systems must have extensive technical knowledge and social understanding. Politeness is a socially desirable trait that plays a crucial role in task-oriented conversations for ensuring better user engagement and satisfaction. To this end, we propose a novel task of politeness analysis in goal-oriented dialogues. Politeness analysis consists of two sub-tasks: politeness turn identification and phrase extraction. Politeness turn identification is dependent on textual triggers denoting politeness or impoliteness. In this regard, we propose a Bidirectional Encoder Representations from Transformers-Directional Graph Convolutional Network (BERT-DGCN) based multi-task learning approach that performs turn identification and phrase extraction tasks in a unified framework. Our proposed approach employs BERT for encoding input turns and DGCN for encoding syntactic information, in which dependency among words is incorporated into DGCN to improve its capability to represent input utterances and benefit politeness analysis task accordingly. Our proposed model classifies each turn of a conversation into one of the three pre-defined classes, <em>viz.</em> polite, impolite and neutral, and extracts phrases denoting politeness or impoliteness in that turn simultaneously. As there is no such readily available data, we prepare a conversational dataset, <strong><em>PoDial</em></strong> for mental health counseling and legal aid for crime victims in English for our experiment. Experimental results demonstrate that our proposed approach is effective and achieves 2.04 points improvement on turn identification accuracy and 2.40 points on phrase extraction F1- score on our dataset over baselines.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101661"},"PeriodicalIF":4.3,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140947812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信