冷启动语境下的跨域语料库选择

IF 1.8 4区 管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Wei-Ching Hsiao, Hei Chia Wang
{"title":"冷启动语境下的跨域语料库选择","authors":"Wei-Ching Hsiao, Hei Chia Wang","doi":"10.1177/01655515241263283","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is a powerful tool for monitoring attitudes towards companies, products or services and identifying specific features that drive positive or negative sentiment. However, collecting labelled data for training sentiment analysis models in a specific domain can be challenging in practical applications. One promising solution to this ‘cold-start’ problem is domain adaptation, which leverages labelled data from a related source domain to train a model for the target domain. A critical yet often neglected aspect in prior research is the measurement of similarity between the source and target domains, a factor that greatly impacts the success of domain adaptation. To fill this gap, we propose a novel measure that combines semantic, syntactic and lexical features to assess corpus-level similarity between two domains. Our experimental results demonstrate that our method achieves high precision (0.91) and recall (0.75), outperforming traditional methods. Moreover, our proposed measure can assist new domain products in selecting the most suitable training data set for their sentiment analysis tasks.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"19 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-domain corpus selection for cold-start context\",\"authors\":\"Wei-Ching Hsiao, Hei Chia Wang\",\"doi\":\"10.1177/01655515241263283\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is a powerful tool for monitoring attitudes towards companies, products or services and identifying specific features that drive positive or negative sentiment. However, collecting labelled data for training sentiment analysis models in a specific domain can be challenging in practical applications. One promising solution to this ‘cold-start’ problem is domain adaptation, which leverages labelled data from a related source domain to train a model for the target domain. A critical yet often neglected aspect in prior research is the measurement of similarity between the source and target domains, a factor that greatly impacts the success of domain adaptation. To fill this gap, we propose a novel measure that combines semantic, syntactic and lexical features to assess corpus-level similarity between two domains. Our experimental results demonstrate that our method achieves high precision (0.91) and recall (0.75), outperforming traditional methods. Moreover, our proposed measure can assist new domain products in selecting the most suitable training data set for their sentiment analysis tasks.\",\"PeriodicalId\":54796,\"journal\":{\"name\":\"Journal of Information Science\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1177/01655515241263283\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515241263283","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

情感分析是一种功能强大的工具,可用于监测人们对公司、产品或服务的态度,并识别驱动积极或消极情感的具体特征。然而,在实际应用中,收集用于训练特定领域情感分析模型的标记数据可能具有挑战性。解决这一 "冷启动 "问题的一个很有前景的方法是领域适应,即利用相关源领域的标记数据来训练目标领域的模型。在之前的研究中,源域和目标域之间相似性的测量是一个至关重要但又经常被忽视的方面,而这一因素对域适应的成功与否影响极大。为了填补这一空白,我们提出了一种新的测量方法,结合语义、句法和词汇特征来评估两个域之间的语料库级相似性。实验结果表明,我们的方法实现了较高的精确度(0.91)和召回率(0.75),优于传统方法。此外,我们提出的方法还能帮助新领域产品为其情感分析任务选择最合适的训练数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cross-domain corpus selection for cold-start context
Sentiment analysis is a powerful tool for monitoring attitudes towards companies, products or services and identifying specific features that drive positive or negative sentiment. However, collecting labelled data for training sentiment analysis models in a specific domain can be challenging in practical applications. One promising solution to this ‘cold-start’ problem is domain adaptation, which leverages labelled data from a related source domain to train a model for the target domain. A critical yet often neglected aspect in prior research is the measurement of similarity between the source and target domains, a factor that greatly impacts the success of domain adaptation. To fill this gap, we propose a novel measure that combines semantic, syntactic and lexical features to assess corpus-level similarity between two domains. Our experimental results demonstrate that our method achieves high precision (0.91) and recall (0.75), outperforming traditional methods. Moreover, our proposed measure can assist new domain products in selecting the most suitable training data set for their sentiment analysis tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Information Science
Journal of Information Science 工程技术-计算机:信息系统
CiteScore
6.80
自引率
8.30%
发文量
121
审稿时长
4 months
期刊介绍: The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信