Cataloga: A Software for Semantic-Based Terminological Data Mining

2011 First International Conference on Data Compression, Communications and Processing Pub Date : 2011-06-21 DOI:10.1109/CCP.2011.42

A. Elia, Mario Monteleone, Alberto Postiglione

{"title":"Cataloga: A Software for Semantic-Based Terminological Data Mining","authors":"A. Elia, Mario Monteleone, Alberto Postiglione","doi":"10.1109/CCP.2011.42","DOIUrl":null,"url":null,"abstract":"This paper is focused on Catalog a, a software package based on Lexicon-Grammar theoretical and practical analytical framework and embedding a ling ware module built on compressed terminological electronic dictionaries. We will here show how Catalog a can be used to achieve efficient data mining and information retrieval by means of lexical ontology associated to terminology-based automatic textual analysis. Also, we will show how accurate data compression is necessary to build efficient textual analysis software. Therefore, we will here discuss the creation and functioning of a software for semantic-based terminological data mining, in which a crucial role is played by Italian simple and compound-word electronic dictionaries. Lexicon-Grammar is one of the most profitable and consistent methods for natural language formalization and automatic textual analysis it was set up by French linguist Maurice Gross during the '60s, and subsequently developed for and applied to Italian by Annibale Elia, Emilio D'Agostino and Maurizio Martin Elli. Basically, Lexicon-Grammar establishes morph syntactic and statistical sets of analytic rules to read and parse large textual corpora. The analytical procedure here described will prove itself appropriate for any type of digitalized text, and will represent a relevant support for the building and implementing of Semantic Web (SW) interactive platforms.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 First International Conference on Data Compression, Communications and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCP.2011.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper is focused on Catalog a, a software package based on Lexicon-Grammar theoretical and practical analytical framework and embedding a ling ware module built on compressed terminological electronic dictionaries. We will here show how Catalog a can be used to achieve efficient data mining and information retrieval by means of lexical ontology associated to terminology-based automatic textual analysis. Also, we will show how accurate data compression is necessary to build efficient textual analysis software. Therefore, we will here discuss the creation and functioning of a software for semantic-based terminological data mining, in which a crucial role is played by Italian simple and compound-word electronic dictionaries. Lexicon-Grammar is one of the most profitable and consistent methods for natural language formalization and automatic textual analysis it was set up by French linguist Maurice Gross during the '60s, and subsequently developed for and applied to Italian by Annibale Elia, Emilio D'Agostino and Maurizio Martin Elli. Basically, Lexicon-Grammar establishes morph syntactic and statistical sets of analytic rules to read and parse large textual corpora. The analytical procedure here described will prove itself appropriate for any type of digitalized text, and will represent a relevant support for the building and implementing of Semantic Web (SW) interactive platforms.

查看原文本刊更多论文

Cataloga:基于语义的术语数据挖掘软件

目录a是一个基于词典语法理论和实践分析框架，嵌入一个基于压缩术语电子词典的语言模块的软件包。我们将在这里展示如何使用Catalog a通过与基于术语的自动文本分析相关联的词汇本体来实现有效的数据挖掘和信息检索。此外，我们将展示准确的数据压缩对于构建高效的文本分析软件是多么必要。因此，我们将在这里讨论基于语义的术语数据挖掘软件的创建和功能，其中意大利语简单词和复合词电子词典起着至关重要的作用。Lexicon-Grammar是自然语言形式化和自动文本分析中最有效和最一致的方法之一，它是由法国语言学家Maurice Gross在60年代建立的，随后由Annibale Elia, Emilio D' agostino和Maurizio Martin Elli发展并应用于意大利语。基本上，Lexicon-Grammar建立了分析规则的词形、句法和统计集，以阅读和解析大型文本语料库。这里描述的分析过程将证明自己适用于任何类型的数字化文本，并将代表对语义网(SW)交互平台的构建和实现的相关支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 First International Conference on Data Compression, Communications and Processing

自引率

0.00%

发文量