基于规则泛化和本体的统计指标表信息提取

Muhammad Rio Bastian, A. Purwarianti
{"title":"基于规则泛化和本体的统计指标表信息提取","authors":"Muhammad Rio Bastian, A. Purwarianti","doi":"10.1109/ICITSI.2016.7858187","DOIUrl":null,"url":null,"abstract":"The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.","PeriodicalId":172314,"journal":{"name":"2016 International Conference on Information Technology Systems and Innovation (ICITSI)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Information extraction in statistics indicator tables using rule generalizations and ontology\",\"authors\":\"Muhammad Rio Bastian, A. Purwarianti\",\"doi\":\"10.1109/ICITSI.2016.7858187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.\",\"PeriodicalId\":172314,\"journal\":{\"name\":\"2016 International Conference on Information Technology Systems and Innovation (ICITSI)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Information Technology Systems and Innovation (ICITSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITSI.2016.7858187\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Information Technology Systems and Innovation (ICITSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITSI.2016.7858187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

基于规则的信息提取技术的主要问题是,提取规则往往是针对特定的信息或文档结构专门设计的;因此,如果不作适当的修改,它就不能直接用于另一个人。像表这样的半结构化文档对信息提取提出了另一个挑战;由于没有关于如何设计它的标准,表的结构可以是多种多样的。统计指标是一种信息源,使用表格作为数据表示的手段。统计指标也有一个必须仔细识别和提取的关系概念。泛化规则试图通过创建通用术语的提取规则来减少提取规则修改过程中的工作量。与本体相结合,规则还可以提取指标之间的关系。该信息提取系统的输出是一个数据库,它不仅保留了数据本身,而且还保留了指标之间的关系概念。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Information extraction in statistics indicator tables using rule generalizations and ontology
The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信