{"title":"Aligning Data Records Using WordNet","authors":"Jer Lang Hong, Eu-Gene Siew, S. Egerton","doi":"10.1109/ICCRD.2010.79","DOIUrl":null,"url":null,"abstract":"Current automatic wrappers using DOM tree to align data records generally have limitations such as the inability to align iterative (repetitive and similar) and disjunctive (optional) data items. Our study on the properties of data records shows that these data items can be aligned based on their semantic properties. In this context, we propose an ontological technique using existing lexical database for English (WordNet) for the alignment of data records. Regular expression rules are developed to align the data items extracted so that they can be used for further processing. Experimental results indicate that our technique is robust and performs better than the existing state of the art wrappers.","PeriodicalId":158568,"journal":{"name":"2010 Second International Conference on Computer Research and Development","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Computer Research and Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCRD.2010.79","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Current automatic wrappers using DOM tree to align data records generally have limitations such as the inability to align iterative (repetitive and similar) and disjunctive (optional) data items. Our study on the properties of data records shows that these data items can be aligned based on their semantic properties. In this context, we propose an ontological technique using existing lexical database for English (WordNet) for the alignment of data records. Regular expression rules are developed to align the data items extracted so that they can be used for further processing. Experimental results indicate that our technique is robust and performs better than the existing state of the art wrappers.