Computer Speech and Language最新文献

筛选
英文 中文
ChatMatch: Exploring the potential of hybrid vision–language deep learning approach for the intelligent analysis and inference of racket sports ChatMatch:探索视觉-语言混合深度学习方法在球拍类运动智能分析和推理中的潜力
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-25 DOI: 10.1016/j.csl.2024.101694
Jiawen Zhang , Dongliang Han , Shuai Han , Heng Li , Wing-Kai Lam , Mingyu Zhang
{"title":"ChatMatch: Exploring the potential of hybrid vision–language deep learning approach for the intelligent analysis and inference of racket sports","authors":"Jiawen Zhang ,&nbsp;Dongliang Han ,&nbsp;Shuai Han ,&nbsp;Heng Li ,&nbsp;Wing-Kai Lam ,&nbsp;Mingyu Zhang","doi":"10.1016/j.csl.2024.101694","DOIUrl":"10.1016/j.csl.2024.101694","url":null,"abstract":"<div><p>Video understanding technology has become increasingly important in various disciplines, yet current approaches have primarily focused on lower comprehension level of video content, posing challenges for providing comprehensive and professional insights at a higher comprehension level. Video analysis plays a crucial role in athlete training and strategy development in racket sports. This study aims to demonstrate an innovative and higher-level video comprehension framework (ChatMatch), which integrates computer vision technologies with the cutting-edge large language models (LLM) to enable intelligent analysis and inference of racket sports videos. To examine the feasibility of this framework, we deployed a prototype of ChatMatch in the badminton in this study. A vision-based encoder was first proposed to extract the meta-features included the locations, actions, gestures, and action results of players in each frame of racket match videos, followed by a rule-based decoding method to transform the extracted information in both structured knowledge and unstructured knowledge. A set of LLM-based agents included namely task identifier, coach agent, statistician agent, and video manager, was developed through a prompt engineering and driven by an automated mechanism. The automatic collaborative interaction among the agents enabled the provision of a comprehensive response to professional inquiries from users. The validation findings showed that our vision models had excellent performances in meta-feature extraction, achieving a location identification accuracy of 0.991, an action recognition accuracy of 0.902, and a gesture recognition accuracy of 0.950. Additionally, a total of 100 questions were gathered from four proficient badminton players and one coach to evaluate the performance of the LLM-based agents, and the outcomes obtained from ChatMatch exhibited commendable results across general inquiries, statistical queries, and video retrieval tasks. These findings highlight the potential of using this approach that can offer valuable insights for athletes and coaches while significantly improve the efficiency of sports video analysis.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101694"},"PeriodicalIF":3.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000779/pdfft?md5=2c72701b559ac872232548320e08722b&pid=1-s2.0-S0885230824000779-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141853772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On improving conversational interfaces in educational systems 关于改进教育系统中的对话界面
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-23 DOI: 10.1016/j.csl.2024.101693
Yuyan Wu, Romina Soledad Albornoz-De Luise, Miguel Arevalillo-Herráez
{"title":"On improving conversational interfaces in educational systems","authors":"Yuyan Wu,&nbsp;Romina Soledad Albornoz-De Luise,&nbsp;Miguel Arevalillo-Herráez","doi":"10.1016/j.csl.2024.101693","DOIUrl":"10.1016/j.csl.2024.101693","url":null,"abstract":"<div><p>Conversational Intelligent Tutoring Systems (CITS) have drawn increasing interest in education because of their capacity to tailor learning experiences, improve user engagement, and contribute to the effective transfer of knowledge. Conversational agents employ advanced natural language techniques to engage in a convincing human-like tutorial conversation. In solving math word problems, a significant challenge arises in enabling the system to understand user utterances and accurately map extracted entities to the essential problem quantities required for problem-solving, despite the inherent ambiguity of human natural language. In this study, we propose two possible approaches to enhance the performance of a particular CITS designed to teach learners to solve arithmetic–algebraic word problems. Firstly, we propose an ensemble approach to intent classification and entity extraction, which combines the predictions made by two distinct individual models that use constraints defined by human experts. This approach leverages the intertwined nature of the intents and entities to yield a comprehensive understanding of the user’s utterance, ultimately aiming to enhance semantic accuracy. Secondly, we introduce an adapted Term Frequency-Inverse Document Frequency technique to associate entities with problem quantity descriptions. The evaluation was conducted on the AWPS and MATH-HINTS datasets, containing conversational data and a collection of arithmetical and algebraic math problems, respectively. The results demonstrate that the proposed ensemble approach outperforms individual models, and the proposed method for entity–quantity matching surpasses the performance of typical text semantic embedding models.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101693"},"PeriodicalIF":3.1,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000767/pdfft?md5=56f2f2395571e332090191dc68fc5505&pid=1-s2.0-S0885230824000767-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A computational analysis of transcribed speech of people living with dementia: The Anchise 2022 Corpus 对痴呆症患者语音转录的计算分析:Anchise 2022 语料库
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-22 DOI: 10.1016/j.csl.2024.101691
Francesco Sigona , Daniele P. Radicioni , Barbara Gili Fivela , Davide Colla , Matteo Delsanto , Enrico Mensa , Andrea Bolioli , Pietro Vigorelli
{"title":"A computational analysis of transcribed speech of people living with dementia: The Anchise 2022 Corpus","authors":"Francesco Sigona ,&nbsp;Daniele P. Radicioni ,&nbsp;Barbara Gili Fivela ,&nbsp;Davide Colla ,&nbsp;Matteo Delsanto ,&nbsp;Enrico Mensa ,&nbsp;Andrea Bolioli ,&nbsp;Pietro Vigorelli","doi":"10.1016/j.csl.2024.101691","DOIUrl":"10.1016/j.csl.2024.101691","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Introduction&lt;/h3&gt;&lt;p&gt;Automatic linguistic analysis can provide cost-effective, valuable clues to the diagnosis of cognitive difficulties and to therapeutic practice, and hence impact positively on wellbeing. In this work, we analyzed transcribed conversations between elderly individuals living with dementia and healthcare professionals. The material came from the Anchise 2022 Corpus, a large collection of transcripts of conversations in Italian recorded in naturalistic conditions. The aim of the work was to test the effectiveness of a number of automatic analyzes in finding correlations with the progression of dementia in individuals with cognitive decline as measured by the Mini-Mental State Examination (MMSE) score, which is the only psychometric-clinical information available on the participants in the conversations. Healthy controls (HC) were not considered in this study, nor does the corpus itself include HCs. The main innovation and strength of the work consists in the high ecological validity of the language analyzed (most of the literature to date concerns controlled language experiments); in the use of Italian (there is little corpora for Italian); in the size of the analyzed data (more than 200 conversations were considered); in the adoption of a wide range of NLP methods, that span from traditional morphosyntactic investigation to deep linguistic models for conducting analyzes such as through perplexity, sentiment (polarity) and emotions.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Methods&lt;/h3&gt;&lt;p&gt;Analyzing real-world interactions not designed with computational analysis in mind, such as is the case of the Anchise Corpus, is particularly challenging. To achieve the research goals, a wide variety of tools were employed. These included traditional morphosyntactic analysis based on digital linguistic biomarkers (DLBs), transformer-based language models, sentiment and emotion analysis, and perplexity metrics. Analyzes were conducted both on the continuous range of MMSE values and on the severe/moderate/mild categorization suggested by AIFA (Italian Medicines Agency) guidelines, based on MMSE threshold values.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results and discussion&lt;/h3&gt;&lt;p&gt;Correlations between MMSE and individual DLBs were weak, up to 0.19 for positive, and -0.21 for negative correlation values. Nevertheless, some correlations were statistically significant and consistent with the literature, suggesting that people with a greater degree of impairment tend to show a reduced vocabulary, to have anomia, to adopt a more informal linguist register, and to display a simplified use of verbs, with a decrease in the use of participles, gerunds, subjunctive moods, modal verbs, as well as a flattening in the use of the tenses towards the present to the detriment of the past. The -0.26 inverse correlation between perplexity and MMSE suggests that perplexity captures slightly more specific linguistic information, which can complement the MMSE scores. In the categorization tasks, the clas","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101691"},"PeriodicalIF":3.1,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000743/pdfft?md5=5a1457a7753032d3fdc01ffd4b14e74e&pid=1-s2.0-S0885230824000743-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141844241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition PaSCoNT - 用于自动语音识别的泰语中北部平行语音库
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-22 DOI: 10.1016/j.csl.2024.101692
Supawat Taerungruang , Phimphaka Taninpong , Vataya Chunwijitra , Sumonmas Thatphithakkul , Sawit Kasuriya , Viroj Inthanon , Pawat Paksaranuwat , Salinee Thumronglaohapun , Nawapon Nakharutai , Papangkorn Inkeaw , Jakramate Bootkrajang
{"title":"PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition","authors":"Supawat Taerungruang ,&nbsp;Phimphaka Taninpong ,&nbsp;Vataya Chunwijitra ,&nbsp;Sumonmas Thatphithakkul ,&nbsp;Sawit Kasuriya ,&nbsp;Viroj Inthanon ,&nbsp;Pawat Paksaranuwat ,&nbsp;Salinee Thumronglaohapun ,&nbsp;Nawapon Nakharutai ,&nbsp;Papangkorn Inkeaw ,&nbsp;Jakramate Bootkrajang","doi":"10.1016/j.csl.2024.101692","DOIUrl":"10.1016/j.csl.2024.101692","url":null,"abstract":"<div><p>This paper proposed a Parallel Speech Corpus of Northern-central Thai (PaSCoNT). The purpose of this research is not only to understand the different linguistic characteristics between Northern and Central Thai, but also to utilize this corpus for automatic speech recognition. The corpus is composed of speech data from dialogues of daily life among northern Thai people. We designed 2,000 Northern Thai sentences covering all phonemes, in collaboration with linguists specialized in the Northern Thai dialect. The samples in this study are 200 Northern Thai dialect speakers who had been living in Chiang Mai province for more than 18 years. The speech was recorded in both open and closed environments. In the speech recording, each speaker must read 100 pairs of Northern-Central Thai sentences to ensure that the speech data comes from the same speaker. In total, 100 h of speech were recorded: 50 h of Northern Thai and 50 h of Central Thai. Overall, PaSCoNT consists of 907,832 words and 6,279 vocabulary items. Statistical analysis of the PaSCoNT corpus revealed that 49.64 % of words in the lexicon belongs to the Northern Thai dialect, 50.36 % from the Central Thai dialect, and 1,621 vocabulary items appeared in both Northern and Central Thai. Statistical analysis is used to examine the difference in speech tempo, i.e. time per phoneme (TTP), syllable per minute (SPM), between Northern and Central Thai. The results revealed that there were statistically significant differences speech tempo between Central and Northern Thai. The TTP speaking and articulation rate of Central Thai is lower than Northern Thai whereas SPM speaking and articulation rate of Central Thai is higher than Northern Thai. The results also showed that the ASR model training using Northern Thai speech corpus provides the lower WER% when testing using Northern Thai testing speech data and provides the higher WER% when testing using Central Thai Testing speech data and vice versa. However, the ASR model training using the PaSCoNT speech corpus provides the lower WER% for both Northern Thai and Central Thai testing speech data.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101692"},"PeriodicalIF":3.1,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000755/pdfft?md5=f97afe2aa357037c83c6473c50174543&pid=1-s2.0-S0885230824000755-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141839086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures 利用多任务学习实现仇恨言论检测的泛化:政治公众人物案例研究
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-17 DOI: 10.1016/j.csl.2024.101690
Lanqin Yuan, Marian-Andrei Rizoiu
{"title":"Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures","authors":"Lanqin Yuan,&nbsp;Marian-Andrei Rizoiu","doi":"10.1016/j.csl.2024.101690","DOIUrl":"10.1016/j.csl.2024.101690","url":null,"abstract":"<div><p>Automatic identification of hateful and abusive content is vital in combating the spread of harmful online content and its damaging effects. Most existing works evaluate models by examining the generalization error on train–test splits on hate speech datasets. These datasets often differ in their definitions and labeling criteria, leading to poor generalization performance when predicting across new domains and datasets. This work proposes a new Multi-task Learning (MTL) pipeline that trains simultaneously across multiple hate speech datasets to construct a more encompassing classification model. Using a dataset-level leave-one-out evaluation (designating a dataset for testing and jointly training on all others), we trial the MTL detection on new, previously unseen datasets. Our results consistently outperform a large sample of existing work. We show strong results when examining the generalization error in train–test splits and substantial improvements when predicting on previously unseen datasets. Furthermore, we assemble a novel dataset, dubbed <span>PubFigs</span>, focusing on the problematic speech of American Public Political Figures. We crowdsource-label using Amazon MTurk more than 20,000 tweets and machine-label problematic speech in all the 305,235 tweets in <span>PubFigs</span>. We find that the abusive and hate tweeting mainly originates from right-leaning figures and relates to six topics, including Islam, women, ethnicity, and immigrants. We show that MTL builds embeddings that can simultaneously separate abusive from hate speech, and identify its topics.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101690"},"PeriodicalIF":3.1,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000731/pdfft?md5=e169fb47936a2284a9d518194884b197&pid=1-s2.0-S0885230824000731-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141853188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving text classification via computing category correlation matrix from text graph 通过计算文本图中的类别相关矩阵改进文本分类
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-09 DOI: 10.1016/j.csl.2024.101688
Zhen Zhang , Mengqiu Liu , Xiyuan Jia , Gongxun Miao , Xin Wang , Hao Ni , Guohua Wu
{"title":"Improving text classification via computing category correlation matrix from text graph","authors":"Zhen Zhang ,&nbsp;Mengqiu Liu ,&nbsp;Xiyuan Jia ,&nbsp;Gongxun Miao ,&nbsp;Xin Wang ,&nbsp;Hao Ni ,&nbsp;Guohua Wu","doi":"10.1016/j.csl.2024.101688","DOIUrl":"10.1016/j.csl.2024.101688","url":null,"abstract":"<div><p>In text classification task, models have shown remarkable accuracy across various datasets. However, confusion often arises when certain categories within the dataset are too similar, causing misclassification of certain samples. This paper proposes an improved method for this problem, through the creation of a three-layer text graph for the corpus, which is used to calculate the Category Correlation Matrix (CCM). Additionally, this paper introduces category-adaptive contrastive learning for text embedding from the encoder, enhancing the model’s ability to distinguish between samples in confusable categories that are easily confused. Soft labels are generated using this matrix to guide the classifier, preventing the model from becoming overconfident with one-hot vectors. The efficacy of this approach was demonstrated through experimental evaluations on three text encoders and six different datasets.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101688"},"PeriodicalIF":3.1,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000718/pdfft?md5=936898b07abaca17411cf1265567ad9a&pid=1-s2.0-S0885230824000718-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
C-KGE: Curriculum learning-based Knowledge Graph Embedding C-KGE:基于课程学习的知识图谱嵌入
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-08 DOI: 10.1016/j.csl.2024.101689
Diange Zhou , Shengwen Li , Lijun Dong , Renyao Chen , Xiaoyue Peng , Hong Yao
{"title":"C-KGE: Curriculum learning-based Knowledge Graph Embedding","authors":"Diange Zhou ,&nbsp;Shengwen Li ,&nbsp;Lijun Dong ,&nbsp;Renyao Chen ,&nbsp;Xiaoyue Peng ,&nbsp;Hong Yao","doi":"10.1016/j.csl.2024.101689","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101689","url":null,"abstract":"<div><p>Knowledge graph embedding (KGE) aims to embed entities and relations in knowledge graphs (KGs) into a continuous, low-dimensional vector space. It has been shown as an effective tool for integrating knowledge graphs to improve various intelligent applications, such as question answering and information extraction. However, previous KGE models ignore the hidden natural order of knowledge learning on learning the embeddings of entities and relations, leaving room for improvement in their performance. Inspired by the easy-to-hard pattern used in human knowledge learning, this paper proposes a <strong>C</strong>urriculum learning-based <strong>KGE</strong> (C-KGE) model, which learns the embeddings of entities and relations from “basic knowledge” to “domain knowledge”. Specifically, a seed set representing the basic knowledge and several knowledge subsets are identified from KG. Then, entity overlap is employed to score the learning difficulty of each subset. Finally, C-KGE trains the entities and relations in each subset according to the learning difficulty score of each subset. C-KGE leverages trained embeddings of the seed set as prior knowledge and learns knowledge subsets iteratively to transfer knowledge between the seed set and subsets, smoothing the learning process of knowledge facts. Experimental results on real-world datasets demonstrate that the proposed model achieves improved embedding performances as well as reducing training time. Our codes and data will be released later.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101689"},"PeriodicalIF":3.1,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400072X/pdfft?md5=fb33df044eeec38fa247696a89eb8787&pid=1-s2.0-S088523082400072X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seq2Seq dynamic planning network for progressive text generation 用于渐进文本生成的 Seq2Seq 动态规划网络
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-06 DOI: 10.1016/j.csl.2024.101687
Di Wu, Peng Cheng, Yuying Zheng
{"title":"Seq2Seq dynamic planning network for progressive text generation","authors":"Di Wu,&nbsp;Peng Cheng,&nbsp;Yuying Zheng","doi":"10.1016/j.csl.2024.101687","DOIUrl":"10.1016/j.csl.2024.101687","url":null,"abstract":"<div><p>Long text generation is a hot topic in natural language processing. To address the problem of insufficient semantic representation and incoherent text generation in existing long text models, the Seq2Seq dynamic planning network progressive text generation model (DPPG-BART) is proposed. In the data pre-processing stage, the lexical division sorting algorithm is used. To obtain hierarchical sequences of keywords with clear information content, word weight values are calculated and ranked by TF-IDF of word embedding. To enhance the input representation, the dynamic planning progressive generation network is constructed. Positional features and word embedding vector features are integrated at the input side of the model. At the same time, to enrich the semantic information and expand the content of the text, the relevant concept words are generated by the concept expansion module. The scoring network and feedback mechanism are used to adjust the concept expansion module. Experimental results show that the DPPG-BART model is optimized over GPT2-S, GPT2-L, BART and ProGen-2 model approaches in terms of metric values of MSJ, B-BLEU and FBD on long text datasets from two different domains, CNN and Writing Prompts.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101687"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000706/pdfft?md5=9c314286f96f095183826029b974049f&pid=1-s2.0-S0885230824000706-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141623113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modified R-BERT with global semantic information for relation classification task 利用全局语义信息进行关系分类任务的改良 R-BERT
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-06 DOI: 10.1016/j.csl.2024.101686
Yuhua Wang , Junying Hu , Yongli Su , Bo Zhang , Kai Sun , Hai Zhang
{"title":"Modified R-BERT with global semantic information for relation classification task","authors":"Yuhua Wang ,&nbsp;Junying Hu ,&nbsp;Yongli Su ,&nbsp;Bo Zhang ,&nbsp;Kai Sun ,&nbsp;Hai Zhang","doi":"10.1016/j.csl.2024.101686","DOIUrl":"10.1016/j.csl.2024.101686","url":null,"abstract":"<div><p>The objective of the relation classification task is to extract relations between entities. Recent studies have found that R-BERT (Wu and He, 2019) based on pre-trained BERT (Devlin et al., 2019) acquires extremely good results in the relation classification task. However, this method does not take into account the semantic differences between different kinds of entities and global semantic information either. In this paper, we set two different fully connected layers to take into account the semantic difference between subject and object entities. Besides, we build a new module named Concat Module to fully fuse the semantic information among the subject entity vector, object entity vector, and the whole sample sentence representation vector. In addition, we apply the average pooling to acquire a better entity representation of each entity and add the activation operation with a new fully connected layer after our Concat Module. Modifying R-BERT, we propose a new model named BERT with Global Semantic Information (GSR-BERT) for relation classification tasks. We use our approach on two datasets: the SemEval-2010 Task 8 dataset and the Chinese character relationship classification dataset. Our approach achieves a significant improvement over the two datasets. It means that our approach enjoys transferability across different datasets. Furthermore, we prove that these policies we used in our approach also enjoy applicability to named entity recognition task.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101686"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S088523082400069X/pdfft?md5=0315d6e108caefa08e405818e501bafd&pid=1-s2.0-S088523082400069X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge 第 7 届 CHiME 挑战赛 UDASE 任务中对语音增强方法的客观和主观评估
IF 3.1 3区 计算机科学
Computer Speech and Language Pub Date : 2024-07-06 DOI: 10.1016/j.csl.2024.101685
Simon Leglaive , Matthieu Fraticelli , Hend ElGhazaly , Léonie Borne , Mostafa Sadeghi , Scott Wisdom , Manuel Pariente , John R. Hershey , Daniel Pressnitzer , Jon P. Barker
{"title":"Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge","authors":"Simon Leglaive ,&nbsp;Matthieu Fraticelli ,&nbsp;Hend ElGhazaly ,&nbsp;Léonie Borne ,&nbsp;Mostafa Sadeghi ,&nbsp;Scott Wisdom ,&nbsp;Manuel Pariente ,&nbsp;John R. Hershey ,&nbsp;Daniel Pressnitzer ,&nbsp;Jon P. Barker","doi":"10.1016/j.csl.2024.101685","DOIUrl":"https://doi.org/10.1016/j.csl.2024.101685","url":null,"abstract":"<div><p>Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the UDASE task of the 7th CHiME challenge aimed to leverage real-world noisy speech recordings from the test domain for unsupervised domain adaptation of speech enhancement models. Specifically, this test domain corresponds to the CHiME-5 dataset, characterized by real multi-speaker and conversational speech recordings made in noisy and reverberant domestic environments, for which ground-truth clean speech signals are not available. In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results. This analysis reveals a limited correlation between subjective ratings and several supervised nonintrusive performance metrics recently proposed for speech enhancement. Conversely, the results suggest that more traditional intrusive objective metrics can be used for in-domain performance evaluation using the reverberant LibriCHiME-5 dataset developed for the challenge. The subjective evaluation indicates that all systems successfully reduced the background noise, but always at the expense of increased distortion. Out of the four speech enhancement methods evaluated subjectively, only one demonstrated an improvement in overall quality compared to the unprocessed noisy speech, highlighting the difficulty of the task. The tools and audio material created for the CHiME-7 UDASE task are shared with the community.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"89 ","pages":"Article 101685"},"PeriodicalIF":3.1,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000688/pdfft?md5=8f9da64ecc09fa13d3d77b048c8fa3ae&pid=1-s2.0-S0885230824000688-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信