Maximum Entropy combined FSM stemming method for Uyghur

2009 Oriental COCOSDA International Conference on Speech Database and Assessments Pub Date : 2009-10-02 DOI:10.1109/ICSDA.2009.5278378

Aishan Wumaier, Zaokere Kadeer, Parida Tursun, Shengwei Tian

引用次数: 2

Abstract

This paper presents the generation of Uyghur Noun Suffix DFA combined with Maximum Entropy (MaxEnt) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the MaxEnt model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the MaxEnt suffix identifying model and combination of MaxEnt with FSM.

查看原文本刊更多论文

维吾尔语最大熵组合FSM词干方法

本文提出了结合最大熵(MaxEnt)的词干提取算法生成维吾尔语名词后缀DFA的方法。由于维吾尔语的黏着性，词干提取是维吾尔语处理应用中的一项重要任务。本文采用倒序词形规则生成维吾尔语名词屈折后缀有限状态机。但是有八个后缀与一些单词的结尾部分相似。这些后缀使FSM具有歧义性。我们应用MaxEnt模型来解决FSM的模糊性问题。介绍了FSM的生成、MaxEnt后缀识别模型的建立以及MaxEnt与FSM的结合等步骤。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

自引率

0.00%

发文量