Tanya Talkar , James R. Williamson , Sophia Yuditskaya , Daniel J. Hannon , Hrishikesh M. Rao , Lisa Nowinski , Hannah Saro , Maria Mody , Christopher J. McDougle , Thomas F. Quatieri
{"title":"An exploratory characterization of speech- and fine-motor coordination in verbal children with Autism spectrum disorder","authors":"Tanya Talkar , James R. Williamson , Sophia Yuditskaya , Daniel J. Hannon , Hrishikesh M. Rao , Lisa Nowinski , Hannah Saro , Maria Mody , Christopher J. McDougle , Thomas F. Quatieri","doi":"10.1016/j.csl.2024.101665","DOIUrl":"10.1016/j.csl.2024.101665","url":null,"abstract":"<div><p>Autism spectrum disorder (ASD) is a neurodevelopmental disorder often associated with difficulties in speech production and fine-motor tasks. Thus, there is a need to develop objective measures to assess and understand speech production and other fine-motor challenges in individuals with ASD. In addition, recent research suggests that difficulties with speech production and fine-motor tasks may contribute to language difficulties in ASD. In this paper, we explore the utility of an off-body recording platform, from which we administer a speech- and fine-motor protocol to verbal children with ASD and neurotypical controls. We utilize a correlation-based analysis technique to develop proxy measures of motor coordination from signals derived from recordings of speech- and fine-motor behaviors. Eigenvalues of the resulting correlation matrix are inputs to Gaussian Mixture Models to discriminate between highly-verbal children with ASD and neurotypical controls. These eigenvalues also characterize the complexity (underlying dimensionality) of representative signals of speech- and fine-motor movement dynamics, and form the feature basis to estimate scores on an expressive vocabulary measure. Based on a pilot dataset (15 ASD and 15 controls), features derived from an oral story reading task are used in discriminating between the two groups with AUCs > 0.80, and highlight lower complexity of coordination in children with ASD. Features derived from handwriting and maze tracing tasks led to AUCs of 0.86 and 0.91, however features derived from ocular tasks did not aid in discrimination between the ASD and neurotypical groups. In addition, features derived from free speech and sustained vowel tasks are strongly correlated with expressive vocabulary scores. These results indicate the promise of a correlation-based analysis in elucidating motor differences between individuals with ASD and neurotypical controls.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101665"},"PeriodicalIF":4.3,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000482/pdfft?md5=6554015220341426a1f33615cd53fd75&pid=1-s2.0-S0885230824000482-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141139333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A potential relation trigger method for entity-relation quintuple extraction in text with excessive entities","authors":"Xiaojun Xia , Yujiang Liu , Lijun Fu","doi":"10.1016/j.csl.2024.101650","DOIUrl":"10.1016/j.csl.2024.101650","url":null,"abstract":"<div><p>In the task of joint entity and relation extraction, the relationship between two entities is determined by some specific words in their source text. These words are viewed as potential triggers which are the evidence to explain the relationship but not marked clearly. However, the current models cannot make good use of the potential words to optimize components of entities and relations, but can only give separate results. These models aim to identify the type of relation between two entities mentioned in the source text by encoding the text and entities. Although some models can generate the weights for every single word by improving the attention mechanism, the weights will be influenced by the irrelevant words essentially, which is not needed in enhancing the influence of the triggers. We propose a joint entity-relation quintuple extraction framework based on the Potential Relation Trigger (PRT) method to get the highest probability of a word as the prompt in every time step and join the words together as relation hints. In specific, we leverage polarization mechanism in possibility calculation to avoid nondifferentiable points of the functions in our method when choosing. We find that their representation will improve the performance of the relation part with the exact range of the entities. Extensive experiments results demonstrate that the effectiveness of our proposed model achieves state-of-the-art performance on four RE benchmark datasets.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101650"},"PeriodicalIF":4.3,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000330/pdfft?md5=394e6ed4d34c985c0397218c2f0043ed&pid=1-s2.0-S0885230824000330-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141047279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Xie , Tao Zou , Junjie Yang , Weijun Sun , Shengli Xie
{"title":"Room impulse response reshaping-based expectation–maximization in an underdetermined reverberant environment","authors":"Yuan Xie , Tao Zou , Junjie Yang , Weijun Sun , Shengli Xie","doi":"10.1016/j.csl.2024.101664","DOIUrl":"10.1016/j.csl.2024.101664","url":null,"abstract":"<div><p>Source separation in an underdetermined reverberation environment is a very challenging issue. The classical method is based on the expectation–maximization algorithm. However, it is limited to high reverberation environments, resulting in bad or even invalid separation performance. To eliminate this restriction, a room impulse response reshaping-based expectation–maximization method is designed to solve the problem of source separation in an underdetermined reverberant environment. Firstly, a room impulse response reshaping technology is designed to eliminate the influence of audible echo on the reverberant environment, improving the quality of the received signals. Then, a new mathematical model of time-frequency mixing signals is established to reduce the approximation error of model transformation caused by high reverberation. Furthermore, an improved expectation–maximization method is proposed for real-time update learning rules of model parameters, and then the sources are separated using the estimators provided by the improved expectation–maximization method. Experimental results based on source separation of speech and music mixtures demonstrate that the proposed algorithm achieves better separation performance while maintaining much better robustness than popular expectation–maximization methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101664"},"PeriodicalIF":4.3,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141047534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julia Ohse , Bakir Hadžić , Parvez Mohammed , Nicolina Peperkorn , Michael Danner , Akihiro Yorita , Naoyuki Kubota , Matthias Rätsch , Youssef Shiban
{"title":"Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection","authors":"Julia Ohse , Bakir Hadžić , Parvez Mohammed , Nicolina Peperkorn , Michael Danner , Akihiro Yorita , Naoyuki Kubota , Matthias Rätsch , Youssef Shiban","doi":"10.1016/j.csl.2024.101663","DOIUrl":"10.1016/j.csl.2024.101663","url":null,"abstract":"<div><p>Depression is a significant global health challenge. Still, many people suffering from depression remain undiagnosed. Furthermore, the assessment of depression can be subject to human bias. Natural Language Processing (NLP) models offer a promising solution. We investigated the potential of four NLP models (BERT, Llama2-13B, GPT-3.5, and GPT-4) for depression detection in clinical interviews. Participants (N = 82) underwent clinical interviews and completed a self-report depression questionnaire. NLP models inferred depression scores from interview transcripts. Questionnaire cut-off values for depression were used as a classifier for depression. GPT-4 showed the highest accuracy for depression classification (F1 score 0.73), while zero-shot GPT-3.5 initially performed with low accuracy (0.34), improved to 0.82 after fine-tuning, and achieved 0.68 with clustered data. GPT-4 estimates of symptom severity PHQ-8 score correlated strongly (r = 0.71) with true symptom severity. These findings demonstrate the potential of AI models for depression detection. However, further research is necessary before widespread deployment can be considered.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101663"},"PeriodicalIF":4.3,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141043762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two in One: A multi-task framework for politeness turn identification and phrase extraction in goal-oriented conversations","authors":"Priyanshu Priya, Mauajama Firdaus, Asif Ekbal","doi":"10.1016/j.csl.2024.101661","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101661","url":null,"abstract":"<div><p>Goal-oriented dialogue systems are becoming pervasive in human lives. To facilitate task completion and human participation in a practical setting, such systems must have extensive technical knowledge and social understanding. Politeness is a socially desirable trait that plays a crucial role in task-oriented conversations for ensuring better user engagement and satisfaction. To this end, we propose a novel task of politeness analysis in goal-oriented dialogues. Politeness analysis consists of two sub-tasks: politeness turn identification and phrase extraction. Politeness turn identification is dependent on textual triggers denoting politeness or impoliteness. In this regard, we propose a Bidirectional Encoder Representations from Transformers-Directional Graph Convolutional Network (BERT-DGCN) based multi-task learning approach that performs turn identification and phrase extraction tasks in a unified framework. Our proposed approach employs BERT for encoding input turns and DGCN for encoding syntactic information, in which dependency among words is incorporated into DGCN to improve its capability to represent input utterances and benefit politeness analysis task accordingly. Our proposed model classifies each turn of a conversation into one of the three pre-defined classes, <em>viz.</em> polite, impolite and neutral, and extracts phrases denoting politeness or impoliteness in that turn simultaneously. As there is no such readily available data, we prepare a conversational dataset, <strong><em>PoDial</em></strong> for mental health counseling and legal aid for crime victims in English for our experiment. Experimental results demonstrate that our proposed approach is effective and achieves 2.04 points improvement on turn identification accuracy and 2.40 points on phrase extraction F1- score on our dataset over baselines.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101661"},"PeriodicalIF":4.3,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140947812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cross-attention augmented model for event-triggered context-aware story generation","authors":"Chen Tang , Tyler Loakman , Chenghua Lin","doi":"10.1016/j.csl.2024.101662","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101662","url":null,"abstract":"<div><p>Despite recent advancements, existing story generation systems continue to encounter difficulties in effectively incorporating contextual and event features, which greatly influence the quality of generated narratives. To tackle these challenges, we introduce a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by employing a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism enables our model to exploit logical relationships between events more effectively during the story generation process. To further enhance our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This allows EtriCA to adapt to a wider range of data samples. This results in approximately 5% improvement in automatic metrics and over 10% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art (SOTA) baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results underscore the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101662"},"PeriodicalIF":4.3,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000457/pdfft?md5=6b2981aa01c6fa0779df7e400f7d036a&pid=1-s2.0-S0885230824000457-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141077706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Fajardo-Delgado , Isabel G. Vázquez-Gómez , Humberto Pérez-Espinosa
{"title":"Addressing subjectivity in paralinguistic data labeling for improved classification performance: A case study with Spanish-speaking Mexican children using data balancing and semi-supervised learning","authors":"Daniel Fajardo-Delgado , Isabel G. Vázquez-Gómez , Humberto Pérez-Espinosa","doi":"10.1016/j.csl.2024.101652","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101652","url":null,"abstract":"<div><p>Paralinguistics is an essential component of verbal communication, comprising elements that provide additional information to the language, such as emotional signals. However, the subjective nature of perceiving affective aspects, such as emotions, poses a significant challenge to the development of quality resources for training recognition models of paralinguistic features. Labelers may have different opinions and perceive different emotions from others, making it difficult to achieve a diverse and sufficient representation of considered categories. In this study, we focused on the automatic classification of paralinguistic aspects in Spanish-speaking Mexican children of elementary school age. However, the dataset presents a strong imbalance in all labeled aspects and a low agreement between the labelers. Furthermore, the audio samples were too short, making it challenging to accurately classify affective speech. To address these challenges, we propose a novel method that combines data balancing algorithms and semisupervised learning to improve the classification performance of the trained models. Our method aims to mitigate the subjectivity involved in labeling paralinguistic data, thus advancing the development of robust and accurate recognition models of affective aspects in speech.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101652"},"PeriodicalIF":4.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140879895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Noemí Merayo , Rosalía Cotelo , Rocío Carratalá-Sáez , Francisco J. Andújar
{"title":"Applying machine learning to assess emotional reactions to video game content streamed on Spanish Twitch channels","authors":"Noemí Merayo , Rosalía Cotelo , Rocío Carratalá-Sáez , Francisco J. Andújar","doi":"10.1016/j.csl.2024.101651","DOIUrl":"10.1016/j.csl.2024.101651","url":null,"abstract":"<div><p>This research explores for the first time the application of machine learning to detect emotional responses in video game streaming channels, specifically on Twitch, the most widely used platform for broadcasting content. Analyzing sentiment in gaming contexts is difficult due to the brevity of messages, the lack of context, and the use of informal language, which is exacerbated in the gaming environment by slang, abbreviations, memes, and jargon. First, a novel Spanish corpus was created from chat messages on Spanish video game Twitch channels, manually labeled for polarity and emotions. It is noteworthy as the first Spanish corpus for analyzing social responses on Twitch. Secondly, machine learning algorithms were used to classify polarity and emotions offering promising evaluations. The methodology followed in this work consists of three main steps: (1) Extracting Twitch chat messages from Spanish streamers’ channels related to gaming events and gameplays; (2) Processing and selecting the messages to form the corpus and manually annotating polarity and emotions; and (3) Applying machine learning models to detect polarity and emotions in the created corpus. The results have shown that a Bidirectional Encoder Representation from Transformers (BERT) based model excels with 78% accuracy in polarity detection, while deep learning and Random Forest models reach around 70%. For emotion detection, the BERT model performs best with 68%, followed by deep learning with 55%. It is worth noting that emotion detection is more challenging due to the subjective interpretation of emotions in the complex communicative context of video gaming on platforms such as Twitch. The use of supervised learning techniques, together with the rigorous corpus labeling process and the subsequent corpus pre-processing methodology, has helped to mitigate these challenges, and the algorithms have performed well. The main limitations of the research involve category and video game representation balance. Finally, it is important to stress that the integration of machine learning in video games and on Twitch is innovative, by allowing the identification of viewers’ emotions on streamers’ channels. This innovation could bring benefits such as a better understanding of audience sentiment, improving content and audience retention, providing personalized recommendations and detecting toxic behavior in chats.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101651"},"PeriodicalIF":4.3,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000342/pdfft?md5=fa76bef8f1f9ae5572fb71d8165adda9&pid=1-s2.0-S0885230824000342-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140786978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IKDSumm: Incorporating key-phrases into BERT for extractive disaster tweet summarization","authors":"Piyush Kumar Garg , Roshni Chakraborty , Srishti Gupta , Sourav Kumar Dandapat","doi":"10.1016/j.csl.2024.101649","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101649","url":null,"abstract":"<div><p>Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Humanitarian organizations, government agencies, and volunteers rely on a concise compilation of such information for effective disaster management. Existing methods to make such compilations are mostly generic summarization approaches that do not exploit domain knowledge. In this paper, we propose a disaster-specific tweet summarization framework, <em>IKDSumm</em>, which initially identifies the crucial and important information from each tweet related to a disaster through key-phrases of that tweet. We identify these key-phrases by utilizing the domain knowledge (using existing ontology) of disasters without any human intervention. Further, we utilize these key-phrases to automatically generate a summary of the tweets. Therefore, given tweets related to a disaster, <em>IKDSumm</em> ensures fulfillment of the summarization key objectives, such as information coverage, relevance, and diversity in summary without any human intervention. We evaluate the performance of <em>IKDSumm</em> with 8 state-of-the-art techniques on 12 disaster datasets. The evaluation results show that <em>IKDSumm</em> outperforms existing techniques by approximately <span><math><mrow><mn>2</mn><mo>−</mo><mn>79</mn><mtext>%</mtext></mrow></math></span> in terms of ROUGE-N F1-score.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101649"},"PeriodicalIF":4.3,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140605779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Yes, I am afraid of the sharks and also wild lions!: A multitask framework for enhancing dialogue generation via knowledge and emotion grounding","authors":"Deeksha Varshney, Asif Ekbal","doi":"10.1016/j.csl.2024.101645","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101645","url":null,"abstract":"<div><p>Current end-to-end neural conversation models inherently lack the capability to generate coherently engaging responses. Efforts to boost informativeness have an adversarial effect on emotional and factual accuracy, as validated by several sequence-based models. While these issues can be alleviated by access to emotion labels and background knowledge, there is no guarantee of relevance and informativeness in the generated responses. In real dialogue corpus, informative words like named entities, and words that carry specific emotions can often be infrequent and hard to model, and one primary challenge of the dialogue system is how to promote the model’s capability of generating high-quality responses with those informative words. Furthermore, earlier approaches depended on straightforward concatenation techniques that lacked robust representation capabilities in order to account for human emotions. To address this problem, we propose a novel multitask hierarchical encoder–decoder model, which can enhance the multi-turn dialogue response generation by incorporating external textual knowledge and relevant emotions. Experimental results on a benchmark dataset indicate that our model is superior over competitive baselines concerning both automatic and human evaluation.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101645"},"PeriodicalIF":4.3,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}