Concepts extraction from unstructured Polish texts: A rule based approach

2015 Federated Conference on Computer Science and Information Systems (FedCSIS) Pub Date : 2015-11-09 DOI:10.15439/2015F280

P. Szwed

引用次数: 9

Abstract

We present recently developed solution allowing extraction of concepts from unstructured Polish texts with special focus on correct morphological forms of obtained concept names. As Polish is a highly inflected language, detected names need to be transformed following Polish grammar rules. We propose a user-friendly method for specification of transformation patterns, which is based on a simple annotations language. Annotations prepared by a user are compiled into transformation rules. During the concept extraction process the input document is split into sentences and the rules are applied to sequences of words comprised in sentences. Recognized strings forming concept names are aggregated at various levels and assigned with scores. We report also results of initial experiments performed on a medical text.

查看原文本刊更多论文

从非结构化波兰文本中提取概念:基于规则的方法

我们提出了最近开发的解决方案，允许从非结构化的波兰文本中提取概念，特别关注获得的概念名称的正确形态。由于波兰语是一种高度屈折的语言，检测到的名称需要按照波兰语语法规则进行转换。我们提出了一种基于简单注释语言的用户友好的转换模式规范方法。用户准备的注释被编译成转换规则。在概念提取过程中，输入文档被分割成句子，规则被应用于句子中组成的单词序列。形成概念名称的已识别字符串在不同级别上聚合并分配分数。我们还报告了对医学文本进行的初步实验的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 Federated Conference on Computer Science and Information Systems (FedCSIS)

自引率

0.00%

发文量