Effectiveness of preprocessing techniques over social media texts for the improvement of machine learning based classifiers

2019 XLV Latin American Computing Conference (CLEI) Pub Date : 2019-09-01 DOI:10.1109/CLEI47609.2019.235076

L. Esnaola, Juan Pablo Tessore, Hugo Ramón, C. Russo

引用次数: 0

Abstract

The language present in the context of social networks is usually more informal than the one used in traditional sources. The researches that take such content as input for machine learning based classifying algorithms, perform, as a first step, a cleaning and standardization process. The goal of the latter is to improve the accuracy of the classification. In this paper, several cleaning tasks are defined and executed over a dataset of comments extracted from the social network Facebook. The goal is to verify if the corrections, made by such tasks, produce a significant improvement in the accuracy reached by the classifying algorithms. The results obtained, indicate that, over this type of dataset, preprocessing tasks with a reasonably good performance in the correction of errors, do not necessarily produce a noteworthy improvement in the classification accuracy reached by the algorithms.

查看原文本刊更多论文

社交媒体文本预处理技术对基于机器学习的分类器改进的有效性

社交网络中使用的语言通常比传统资源中使用的语言更不正式。将这些内容作为基于机器学习的分类算法的输入的研究，作为第一步，执行一个清理和标准化过程。后者的目标是提高分类的准确性。在本文中，定义了几个清理任务，并在从社交网络Facebook提取的评论数据集上执行。目标是验证这些任务所做的修正是否能显著提高分类算法所达到的准确性。得到的结果表明，在这种类型的数据集上，具有相当好的纠错性能的预处理任务并不一定会使算法达到的分类精度有明显的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 XLV Latin American Computing Conference (CLEI)

自引率

0.00%

发文量