Using genetic algorithms in word-vector optimisation

2010 UK Workshop on Computational Intelligence (UKCI) Pub Date : 2010-11-09 DOI:10.1109/UKCI.2010.5625589

P. Smith

引用次数: 2

Abstract

Word vectors and sets of words are used in a wide range of text-based applications. Yet these word sets are often chosen on an ad hoc basis. In this study, we examine two text-based applications that use word sets and in both cases find that classification performance can be optimised using a fairly simple genetic algorithm. The first study is in authorship attribution, the second one is sentiment analysis and in both cases classification precision can be improved using a genetic algorithm. In authorship attribution, in recent years the trend has been towards ever larger word vectors [1,2]. We suggest that this might be a counter-productive step as it can easily lead to inaccuracy caused by overfitting or vector-space sparsity (the curse of dimensionality). In sentiment analysis precision is the main issue as rates of greater than 80–85% are not easy to achieve.

查看原文本刊更多论文

遗传算法在词向量优化中的应用

词向量和词集广泛用于基于文本的应用程序中。然而，这些词集通常是在特别的基础上选择的。在这项研究中，我们研究了两个使用词集的基于文本的应用程序，在这两种情况下，我们都发现可以使用一个相当简单的遗传算法来优化分类性能。第一项研究是作者归属，第二项研究是情感分析，在这两种情况下，分类精度都可以使用遗传算法来提高。在作者归属方面，近年来的趋势是越来越大的词向量[1,2]。我们认为这可能是一个适得其反的步骤，因为它很容易导致过度拟合或向量空间稀疏(维度的诅咒)引起的不准确。在情感分析中，精度是主要问题，因为大于80-85%的比率不容易实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 UK Workshop on Computational Intelligence (UKCI)

自引率

0.00%

发文量