Data Quality Controlling for Cross-Lingual Sentiment Classification

2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI:10.1109/IALP.2013.43

Shoushan Li, Yunxia Xue, Zhongqing Wang, Sophia Yat-Mei Lee, Chu-Ren Huang

引用次数: 1

Abstract

Cross-lingual sentiment classification aims to perform sentiment classification in a language (named as the target language) with the help of the resources from another language (named as the source language). Previous studies are prone to using all available data in the source language while using all data is observed to perform no better or even worse than using a partion of good data. In this paper, we propose a novel task called data quality controlling in the source language to select high quality samples from the source language. To tackle this task, we propose two kinds of data quality measurements: intra- and extra-quality measurements which are implemented with the certainty and similarity measurements respectively. The empirical studies demonstrate the effectiveness of the proposed approach to data quality controlling in the source language.

查看原文本刊更多论文

跨语言情感分类的数据质量控制

跨语言情感分类的目的是利用另一种语言(源语言)的资源对一种语言(目标语言)进行情感分类。以前的研究倾向于使用源语言中所有可用的数据，而使用所有数据被观察到的表现并不比使用一部分好的数据更好，甚至更差。在本文中，我们提出了一种新的任务，称为源语言数据质量控制，以从源语言中选择高质量的样本。为了解决这个问题，我们提出了两种数据质量度量:内部质量度量和外部质量度量，它们分别用确定性度量和相似性度量实现。实证研究证明了该方法对源语言数据质量控制的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Asian Language Processing

自引率

0.00%

发文量