基于规则泛化和本体的统计指标表信息提取

2016 International Conference on Information Technology Systems and Innovation (ICITSI) Pub Date : 2016-10-01 DOI:10.1109/ICITSI.2016.7858187

Muhammad Rio Bastian, A. Purwarianti

{"title":"基于规则泛化和本体的统计指标表信息提取","authors":"Muhammad Rio Bastian, A. Purwarianti","doi":"10.1109/ICITSI.2016.7858187","DOIUrl":null,"url":null,"abstract":"The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.","PeriodicalId":172314,"journal":{"name":"2016 International Conference on Information Technology Systems and Innovation (ICITSI)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Information extraction in statistics indicator tables using rule generalizations and ontology\",\"authors\":\"Muhammad Rio Bastian, A. Purwarianti\",\"doi\":\"10.1109/ICITSI.2016.7858187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.\",\"PeriodicalId\":172314,\"journal\":{\"name\":\"2016 International Conference on Information Technology Systems and Innovation (ICITSI)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Information Technology Systems and Innovation (ICITSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITSI.2016.7858187\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Information Technology Systems and Innovation (ICITSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITSI.2016.7858187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

基于规则的信息提取技术的主要问题是，提取规则往往是针对特定的信息或文档结构专门设计的;因此，如果不作适当的修改，它就不能直接用于另一个人。像表这样的半结构化文档对信息提取提出了另一个挑战;由于没有关于如何设计它的标准，表的结构可以是多种多样的。统计指标是一种信息源，使用表格作为数据表示的手段。统计指标也有一个必须仔细识别和提取的关系概念。泛化规则试图通过创建通用术语的提取规则来减少提取规则修改过程中的工作量。与本体相结合，规则还可以提取指标之间的关系。该信息提取系统的输出是一个数据库，它不仅保留了数据本身，而且还保留了指标之间的关系概念。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Information extraction in statistics indicator tables using rule generalizations and ontology

The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 International Conference on Information Technology Systems and Innovation (ICITSI)

自引率

0.00%

发文量