Using longest common subsequence and character models to predict word forms

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 1900-01-01 DOI:10.18653/v1/W16-2009

A. Sorokin

引用次数: 19

Abstract

This paper presents an algorithm for automatic word forms inflection. We use the method of longest common subsequence to extract abstract paradigms from given pairs of basic and inflected word forms, as well as suffix and prefix features to predict this paradigm automatically. We elaborate this algorithm using combination of affix feature-based and character ngram models, which substantially enhances performance especially for the languages possessing nonlocal phenomena such as vowel harmony. Our system took part in SIGMORPHON 2016 Shared Task and took 3rd place in 17 of 30 subtasks and 4th place in 7 substasks among 7 participants.

查看原文本刊更多论文

使用最长公共子序列和字符模型来预测词形

本文提出了一种自动词形变形算法。我们使用最长公共子序列方法从给定的基本和屈折词形对中提取抽象范式，并使用后缀和前缀特征自动预测该范式。我们使用词缀特征模型和字符图模型的结合来详细阐述该算法，该算法大大提高了性能，特别是对于具有非局部现象(如元音和谐)的语言。我们的系统参加了SIGMORPHON 2016共享任务，在30个子任务中获得17个第三名，在7个参与者中获得7个子任务第四名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Special Interest Group on Computational Morphology and Phonology Workshop

自引率

0.00%

发文量