Computer Speech and Language最新文献

筛选
英文 中文
ViTASA: New benchmark and methods for Vietnamese targeted aspect sentiment analysis for multiple textual domains ViTASA:针对多个文本领域的越南语目标方面情感分析的新基准和方法
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-03-27 DOI: 10.1016/j.csl.2025.101800
Khanh Quoc Tran, Quang Phan-Minh Huynh, Oanh Thi-Hong Le, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
{"title":"ViTASA: New benchmark and methods for Vietnamese targeted aspect sentiment analysis for multiple textual domains","authors":"Khanh Quoc Tran,&nbsp;Quang Phan-Minh Huynh,&nbsp;Oanh Thi-Hong Le,&nbsp;Kiet Van Nguyen,&nbsp;Ngan Luu-Thuy Nguyen","doi":"10.1016/j.csl.2025.101800","DOIUrl":"10.1016/j.csl.2025.101800","url":null,"abstract":"<div><div>Targeted Aspect Sentiment Analysis (TASA) has gained substantial attraction in recent years, fostering diverse studies and technological advancements. However, the development of TASA resources for Vietnamese has been limited. This paper introduces ViTASA, a comprehensive, high-quality dataset designed to catalyze advancements in Vietnamese TASA. ViTASA encompasses over 500,000 target-aspect pairs from social media comments across three key domains: mobile, restaurant, and hotel, thereby addressing critical gaps in existing datasets. Additionally, ViTASA integrates a novel multi-task evaluation framework, posing new challenges and enabling robust model assessments. We present ViTASD, an innovative BERT-based approach optimized for the linguistic features of Vietnamese. Comparative analyses demonstrate that ViTASD significantly outperforms existing state-of-the-art methods, including CG-BERT, QACG-BERT, BERT-pair-QA, BERT-pair-NLI, and a range of zero-shot learning models like Gemma, Llama, Mistral and Qwen. Notably, ViTASD achieves superior macro F1-scores of 61.77%, 41.12%, and 52.64% in the mobile, restaurant, and hotel domains respectively. This study not only highlights the challenges inherent in Vietnamese sentiment analysis but also lays a robust foundation for future research endeavors in this area. In a commitment to advancing TASA technology and enhancing the reliability of digital media analyses, we have made the ViTASA dataset, model checkpoints, and source code openly accessible on GitHub<span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101800"},"PeriodicalIF":3.1,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting spatial information and target speaker phoneme loss for multichannel directional speech enhancement and recognition
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-03-26 DOI: 10.1016/j.csl.2025.101801
Cong Pang , Ye Ni , Lin Zhou , Li Zhao , Feifei Xiong
{"title":"Exploiting spatial information and target speaker phoneme loss for multichannel directional speech enhancement and recognition","authors":"Cong Pang ,&nbsp;Ye Ni ,&nbsp;Lin Zhou ,&nbsp;Li Zhao ,&nbsp;Feifei Xiong","doi":"10.1016/j.csl.2025.101801","DOIUrl":"10.1016/j.csl.2025.101801","url":null,"abstract":"<div><div>Directional speech extraction catches increasing attention recently in multichannel speech separation, as it focuses solely on extracting the target speech to make real-time communication (RTC) and automatic speech recognition (ASR) more productive. This work investigates a real-time multichannel neural framework for directional speech enhancement and recognition by exploiting the explicit spatial information derived from the microphone array geometry, and the implicit spatial information learned from a dedicated narrow-band network. In addition to the traditional signal-based loss functions, we further introduce a loss inspired by the ASR phoneme mismatch to guide the framework training towards the distortion-less target speech signals. Experimental results with simulated datasets show that the proposed framework significantly improves the speech quality of the target speaker locating at the specific direction in noisy and reverberant environments with interfering speakers. The improved ASR results with the real-recorded dataset of live conversations from the CHiME8 MMCSG Challenge further verify the effectiveness of our system for practical applications.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101801"},"PeriodicalIF":3.1,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LRetUNet: A U-Net-based retentive network for single-channel speech enhancement
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-03-24 DOI: 10.1016/j.csl.2025.101798
Yuxuan Zhang , Zipeng Zhang , Weiwei Guo , Wei Chen , Zhaohai Liu , Houguang Liu
{"title":"LRetUNet: A U-Net-based retentive network for single-channel speech enhancement","authors":"Yuxuan Zhang ,&nbsp;Zipeng Zhang ,&nbsp;Weiwei Guo ,&nbsp;Wei Chen ,&nbsp;Zhaohai Liu ,&nbsp;Houguang Liu","doi":"10.1016/j.csl.2025.101798","DOIUrl":"10.1016/j.csl.2025.101798","url":null,"abstract":"<div><div>Speech enhancement is an essential component of many user-oriented audio applications, serving as a fundamental task for achieving robust speech processing. Although numerous methods for speech enhancement have been proposed and have shown strong performance, a notable gap persists in the development of lightweight solutions that effectively balance performance with computational efficiency. This paper addresses a significant gap in the field by introducing a novel approach to speech enhancement that integrates a retentive mechanism within a U-Net architecture. The primary innovation of the proposed method is the design and implementation of a high-frequency future filter module, which utilizes the Fast Fourier Transform (FFT) to improve the model’s capacity to preserve and process high-frequency information that is essential for speech clarity. This module, in conjunction with the retentive mechanism, enables the network to preserve essential features across layers, resulting in enhanced speech enhancement performance. The proposed method was assessed utilizing the DNS (Deep Noise Suppression) and VoiceBank+DEMAND dataset, which are widely recognized benchmarks in the field of speech enhancement. The experimental results demonstrate that the proposed method achieves competitive performance while maintaining relatively low computational complexity. This characteristic renders our method particularly suitable for real-time applications, where both performance and efficiency are critical.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101798"},"PeriodicalIF":3.1,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143696740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
E2EPref: An end-to-end preference-based framework for speech quality assessment to alleviate bias in direct assessment scores E2EPref:基于偏好的端到端语音质量评估框架,可减轻直接评估分数的偏差
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-03-23 DOI: 10.1016/j.csl.2025.101799
Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda
{"title":"E2EPref: An end-to-end preference-based framework for speech quality assessment to alleviate bias in direct assessment scores","authors":"Cheng-Hung Hu,&nbsp;Yusuke Yasuda,&nbsp;Tomoki Toda","doi":"10.1016/j.csl.2025.101799","DOIUrl":"10.1016/j.csl.2025.101799","url":null,"abstract":"<div><div>In speech quality assessment (SQA), direct assessment (DA) scores are frequently used as the objective of model training. However, because the DA scores themselves have listener-wise bias and equal range bias, the scores predicted by models trained with DA scores do not always reflect the true quality score. In this study, we utilize preference-based learning for SQA by transforming the DA score prediction framework into a preference prediction framework. Our proposed End-to-End Preference-based framework (E2EPref) for SQA is designed for predicting system-level quality scores directly. It contains four proposed components: pair generation, preference function, threshold selection, and preference aggregation. Through these functions of E2EPref, we aim to mitigate biases introduced by directly using DA scores for training. In experiments, we show that this framework helps the SQA model alleviate biases, resulting in higher system-level Spearman’s rank correlation coefficient and linear correlation coefficient. Additionally, we evaluate the quality prediction capability of the framework in a zero-shot out-of-domain scenario. Finally, we collect subjective preference scores on a dataset already containing DA scores and analyze the advantages and disadvantages of using DA scores versus subjective preference scores as the ground truth or for model training.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101799"},"PeriodicalIF":3.1,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143704125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Summary of the NOTSOFAR-1 challenge: Highlights and learnings
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-03-16 DOI: 10.1016/j.csl.2025.101796
Igor Abramovski , Alon Vinnikov , Shalev Shaer , Naoyuki Kanda , Xiaofei Wang , Amir Ivry , Eyal Krupka
{"title":"Summary of the NOTSOFAR-1 challenge: Highlights and learnings","authors":"Igor Abramovski ,&nbsp;Alon Vinnikov ,&nbsp;Shalev Shaer ,&nbsp;Naoyuki Kanda ,&nbsp;Xiaofei Wang ,&nbsp;Amir Ivry ,&nbsp;Eyal Krupka","doi":"10.1016/j.csl.2025.101796","DOIUrl":"10.1016/j.csl.2025.101796","url":null,"abstract":"<div><div>The first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR-1) Challenge is a pivotal initiative that sets new benchmarks by offering datasets more representative of the needs of real-world business applications than those previously available. The challenge provides a unique combination of 315 recorded meetings across 30 diverse environments, capturing real-world acoustic conditions and conversational dynamics, and a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. In this paper, we provide an overview of the systems submitted to the challenge and analyze the top-performing approaches, hypothesizing the factors behind their success. Additionally, we highlight promising directions left unexplored by participants. By presenting key findings and actionable insights, this work aims to drive further innovation and progress in DASR research and applications.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101796"},"PeriodicalIF":3.1,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DDP-Unet: A mapping neural network for single-channel speech enhancement
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-03-13 DOI: 10.1016/j.csl.2025.101795
Haoxiang Chen , Yanyan Xu , Dengfeng Ke , Kaile Su
{"title":"DDP-Unet: A mapping neural network for single-channel speech enhancement","authors":"Haoxiang Chen ,&nbsp;Yanyan Xu ,&nbsp;Dengfeng Ke ,&nbsp;Kaile Su","doi":"10.1016/j.csl.2025.101795","DOIUrl":"10.1016/j.csl.2025.101795","url":null,"abstract":"<div><div>For speech enhancement tasks, spectrum utilization in the time–frequency domain is crucial, as it enhances the effectiveness of audio feature extraction while reducing computational consumption. Among current speech enhancement methods in the time–frequency domain, DenseBlock and the dual-path transformer have demonstrated promising results. In this paper, to further improve the performance of speech enhancement, we optimize these two modules and propose a novel mapping neural network, DDP-Unet, which comprises three components: the encoder, the decoder, and the bottleneck. Firstly, we introduce a lightweight module, the depth-point convolutional layer (DPCL), which employs point-wise and depth-wise convolutions. DPCL is then integrated into our novel DCdenseBlock, expanding DenseBlock’s receptive field and enhancing feature fusion in the encoder and decoder stages. Additionally, to increase the breadth and depth of feature fusion in the dual-path transformer, we implement a deep dual-path transformer as the bottleneck. DDP-Unet is then evaluated on two public datasets, VCTK + DEMAND and DNS Challenge 2020. Experimental results demonstrate that DDP-Unet outperforms most existing models, achieving state-of-the-art performances on STOI, PESQ, and Si-SDR metrics.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101795"},"PeriodicalIF":3.1,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143637422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel Adaptive Kolmogorov Arnold Sparse Masked Attention Model with multi-loss optimization for Acoustic Echo Cancellation in double-talk noisy scenario
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-03-06 DOI: 10.1016/j.csl.2025.101786
Soni Ishwarya V., Mohanaprasad K.
{"title":"A novel Adaptive Kolmogorov Arnold Sparse Masked Attention Model with multi-loss optimization for Acoustic Echo Cancellation in double-talk noisy scenario","authors":"Soni Ishwarya V.,&nbsp;Mohanaprasad K.","doi":"10.1016/j.csl.2025.101786","DOIUrl":"10.1016/j.csl.2025.101786","url":null,"abstract":"<div><div>In recent years, deep learning techniques have emerged as the predominant approach for Acoustic Echo Cancellation (AEC), owing to their capacity to effectively model complex and nonlinear patterns. This paper presents a novel Adaptive Kolmogorov Arnold Network-Based Sparse Masked Attention Model (KASMA-LossNet) with multi-loss optimization inspired by the Kolmogorov Arnold representation theorem. The model is designed to capture complex nonlinear patterns, thereby improving speech quality and enhancing echo cancellation effectiveness, all while reducing the model’s computational load. The model effectively simplifies complex nonlinear multivariate functions into univariate representations, which is crucial for handling the intricate nonlinear aspects of echo. The KAN-based attention module is designed to apprehend dense speech patterns and analyze the relationships between echo, noise, and the target signal. It also excels at identifying long-range dependencies within the signal, assigning weight scores based on their relevance to the task, and offering exceptional flexibility, enabling the model to adapt to diverse acoustic conditions. To enhance training efficiency, three losses (smoothL1 loss, magnitude loss and log spectral distance (LSD) loss) are combined and integrated into the model, accelerating convergence, speeding up the training process, and delivering more precise results. The proposed model was implemented and tested, demonstrating notable improvements in echo return loss enhancement (ERLE) and perceptual evaluation of speech quality (PESQ). The reduction in computational load of the proposed system is demonstrated through steady GPU utilization and reduced convergence time.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101786"},"PeriodicalIF":3.1,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143578242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A bias evaluation solution for multiple sensitive attribute speech recognition
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-03-04 DOI: 10.1016/j.csl.2025.101787
Zigang Chen , Yuening Zhou , Zhen Wang , Fan Liu , Tao Leng , Haihua Zhu
{"title":"A bias evaluation solution for multiple sensitive attribute speech recognition","authors":"Zigang Chen ,&nbsp;Yuening Zhou ,&nbsp;Zhen Wang ,&nbsp;Fan Liu ,&nbsp;Tao Leng ,&nbsp;Haihua Zhu","doi":"10.1016/j.csl.2025.101787","DOIUrl":"10.1016/j.csl.2025.101787","url":null,"abstract":"<div><div>Speech recognition systems are a pervasive application in the field of <span><math><mrow><mi>A</mi><mi>I</mi></mrow></math></span> (Artificial Intelligence), bringing significant benefits to society. However, they also face significant fairness issues. When dealing with groups of people with different sensitive attributes, these systems tend to exhibit bias, which may lead to the misinterpretation or ignoring of the voice of a specific group of people. In order to address the fairness issue, it becomes crucial to comprehensively reveal the presence of bias in AI systems. To address the issues of limited categories and data imbalance in existing bias evaluation datasets, we propose a new method for constructing evaluation datasets. Given the unique characteristics of speech recognition systems, we find that existing AI bias evaluation methods may not be directly applicable. Therefore, we introduce a bias evaluation method for speech recognition systems based on <span><math><mrow><mi>W</mi><mi>E</mi><mi>R</mi></mrow></math></span> (Word Error Rate). To comprehensively quantify bias across different groups, we combine multiple evaluation metrics, including WER, fairness metrics, and <span><math><mrow><mi>C</mi><mi>M</mi><mi>B</mi><mi>M</mi></mrow></math></span> (confusion matrix-based metrics). To ensure a thorough evaluation, experiments were conducted on both single sensitive attributes and cross-sensitive attributes. The experimental results indicate that, for single sensitive attributes, the speech recognition system exhibits the most significant racial bias, while in the evaluation of cross-sensitive attributes, the system shows the greatest bias against white males and black males. Finally, through T-tests, we demonstrate that the WER differences between these two groups are statistically significant.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101787"},"PeriodicalIF":3.1,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenCeption: Evaluate vision LLMs with unlabeled unimodal data
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-02-28 DOI: 10.1016/j.csl.2025.101785
Lele Cao , Valentin Buchner , Zineb Senane , Fangkai Yang
{"title":"GenCeption: Evaluate vision LLMs with unlabeled unimodal data","authors":"Lele Cao ,&nbsp;Valentin Buchner ,&nbsp;Zineb Senane ,&nbsp;Fangkai Yang","doi":"10.1016/j.csl.2025.101785","DOIUrl":"10.1016/j.csl.2025.101785","url":null,"abstract":"<div><div>Multimodal Large Language Models (MLLMs) are typically assessed using expensive annotated multimodal benchmarks, which often lag behind the rapidly evolving demands of MLLM evaluation. This paper outlines and validates GenCeption, a novel, annotation-free evaluation method that requires only unimodal data to measure inter-modality semantic coherence and inversely assesses MLLMs’ tendency to hallucinate. This approach eliminates the need for costly data annotation, minimizes the risk of training data contamination, is expected to result in slower benchmark saturation, and avoids the illusion of emerging abilities. Inspired by the DrawCeption game, GenCeption begins with a non-textual sample and proceeds through iterative description and generation steps. The semantic drift across iterations is quantified using the GC@<span><math><mi>T</mi></math></span> metric. While GenCeption is principally applicable to MLLMs across various modalities, this paper focuses on its implementation and validation for Vision LLMs (VLLMs). Based on the GenCeption method, we establish the MMECeption benchmark for evaluating VLLMs, and compare the performance of several popular VLLMs and human annotators. Our empirical results validate GenCeption’s effectiveness, demonstrating strong correlations with established VLLM benchmarks. VLLMs still significantly lag behind human performance and struggle especially with text-intensive tasks.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101785"},"PeriodicalIF":3.1,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LSRD-Net: A fine-grained sentiment analysis method based on log-normalized semantic relative distance LSRD-Net:基于对数归一化语义相对距离的细粒度情感分析方法
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2025-02-25 DOI: 10.1016/j.csl.2025.101782
Liming Zhou, Xiaowei Xu, Xiaodong Wang
{"title":"LSRD-Net: A fine-grained sentiment analysis method based on log-normalized semantic relative distance","authors":"Liming Zhou,&nbsp;Xiaowei Xu,&nbsp;Xiaodong Wang","doi":"10.1016/j.csl.2025.101782","DOIUrl":"10.1016/j.csl.2025.101782","url":null,"abstract":"<div><div>With the development of AI technology and increasing scene demands, research on fine-grained sentiment analysis gradually replaces sentence-level or document-level coarse-grained sentiment analysis. However, most of the existing fine-grained sentiment analysis (i.e., aspect-based sentiment analysis) relies heavily on the traditional attention mechanism and does not incorporate prior knowledge for assisted recognition in aspect sentiment focusing, ignoring the importance of aligning aspect terms with sentiment information. Therefore, considering the linguistic conventions when expressing emotions, we propose a Log-SRD-based neural network model named LSRD-Net, aiming to improve the recognition accuracy and alignment efficiency of aspect terms and sentiment tendencies. The model uses the logarithmic function to normalize the semantic relative distance (SRD) matrix, then introduces the optimized matrix into the operation of the attention mechanism to achieve the introduction of a prior knowledge, and improves the alignment of aspect term and sentiment information by means of the improved cross-attention mechanism. To validate the effectiveness of the LSRD-Net, several comparative and ablation experiments are conducted on four fine-grained sentiment analysis datasets. The analysis and evaluation of experimental results demonstrate that the LSRD-Net achieves the state-of-the-art performance.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"93 ","pages":"Article 101782"},"PeriodicalIF":3.1,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143529119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信