Character-based feature extraction with LSTM networks for POS-tagging task

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT) Pub Date : 2016-10-01 DOI:10.1109/ICAICT.2016.7991654

Aibek Makazhanov, Zhandos Yessenbayev

引用次数: 7

Abstract

In this paper we describe a work in progress on designing the continuous vector space word representations able to map unseen data adequately. We propose a LSTM-based feature extraction layer that reads in a sequence of characters corresponding to a word and outputs a single fixed-length real-valued vector. We then test our model on a POS tagging task on four typologically different languages. The results of the experiments suggest that the model can offer a solution to the out-of-vocabulary words problem, as in a comparable setting its OOV accuracy improves over that of a state of the art tagger.

查看原文本刊更多论文

面向pos标注任务的LSTM网络特征提取

在本文中，我们描述了一项正在进行的工作，旨在设计能够充分映射未见数据的连续向量空间词表示。我们提出了一个基于lstm的特征提取层，它读取与单词对应的字符序列，并输出一个固定长度的实值向量。然后，我们在四种不同类型语言的词性标注任务上测试我们的模型。实验结果表明，该模型可以解决词汇表外的单词问题，因为在类似的设置中，它的OOV精度比最先进的标注器提高了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT)

自引率

0.00%

发文量