{"title":"基于规则泛化和本体的统计指标表信息提取","authors":"Muhammad Rio Bastian, A. Purwarianti","doi":"10.1109/ICITSI.2016.7858187","DOIUrl":null,"url":null,"abstract":"The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.","PeriodicalId":172314,"journal":{"name":"2016 International Conference on Information Technology Systems and Innovation (ICITSI)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Information extraction in statistics indicator tables using rule generalizations and ontology\",\"authors\":\"Muhammad Rio Bastian, A. Purwarianti\",\"doi\":\"10.1109/ICITSI.2016.7858187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.\",\"PeriodicalId\":172314,\"journal\":{\"name\":\"2016 International Conference on Information Technology Systems and Innovation (ICITSI)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Information Technology Systems and Innovation (ICITSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITSI.2016.7858187\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Information Technology Systems and Innovation (ICITSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITSI.2016.7858187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Information extraction in statistics indicator tables using rule generalizations and ontology
The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.