Adaptive training instance selection for cross-domain emotion identification

Wenbo Wang, Lu Chen, Keke Chen, K. Thirunarayan, A. Sheth
{"title":"Adaptive training instance selection for cross-domain emotion identification","authors":"Wenbo Wang, Lu Chen, Keke Chen, K. Thirunarayan, A. Sheth","doi":"10.1145/3106426.3106457","DOIUrl":null,"url":null,"abstract":"This paper exploits a large number of self-labeled emotion tweets as the training data from the source domain to improve emotion identification in target domains (i.e., blogs and fairy tales), where there is a short supply of labeled data. Due to the noisy and ambiguous nature of self-labeled emotion training data, the existing domain adaptation methods that typically depend on high-quality labeled source-domain data do not work satisfactorily. This paper describes an adaptive source-domain training instance selection method to address the problem of noisy source-domain training data. The proposed approach can effectively identify the most informative training examples based on three carefully designed measures: consistency, diversity, and similarity. It uses an iterative method that consists of the following steps in each iteration: selecting informative samples from the source domain with the informativeness measures, merging with the target-domain training data, evaluating the performance of learned classifier for the target domain, and updating the informativeness measures for the next iteration. It stops until no new training instance is selected or in a designated number of iterations. Experiments show that our approach performs effectively for cross-domain emotion identification and consistently outperforms baseline approaches across four domains.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"294 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This paper exploits a large number of self-labeled emotion tweets as the training data from the source domain to improve emotion identification in target domains (i.e., blogs and fairy tales), where there is a short supply of labeled data. Due to the noisy and ambiguous nature of self-labeled emotion training data, the existing domain adaptation methods that typically depend on high-quality labeled source-domain data do not work satisfactorily. This paper describes an adaptive source-domain training instance selection method to address the problem of noisy source-domain training data. The proposed approach can effectively identify the most informative training examples based on three carefully designed measures: consistency, diversity, and similarity. It uses an iterative method that consists of the following steps in each iteration: selecting informative samples from the source domain with the informativeness measures, merging with the target-domain training data, evaluating the performance of learned classifier for the target domain, and updating the informativeness measures for the next iteration. It stops until no new training instance is selected or in a designated number of iterations. Experiments show that our approach performs effectively for cross-domain emotion identification and consistently outperforms baseline approaches across four domains.
跨领域情感识别的自适应训练实例选择
本文利用大量自标记的情感推文作为源域的训练数据,改进了标记数据不足的目标域(即博客和童话)的情感识别。由于自标记情感训练数据具有噪声和模糊性,现有的基于高质量标记源域数据的领域自适应方法效果不理想。针对源域训练数据中存在噪声的问题,提出了一种自适应源域训练实例选择方法。所提出的方法可以基于三个精心设计的度量:一致性、多样性和相似性,有效地识别信息最多的训练样例。它采用迭代方法,在每次迭代中包括以下步骤:从具有信息度量的源域中选择信息样本,与目标域训练数据合并,评估学习到的分类器在目标域的性能,更新下一次迭代的信息度量。它会停止,直到没有新的训练实例被选择或在指定的迭代次数。实验表明,我们的方法在跨领域情感识别方面表现有效,并且在四个领域中始终优于基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信