{"title":"Automatic data extraction of websites using data path matching and alignment","authors":"Yu-Chun Chu, Chiun-Chieh Hsu, Cheng-Jhe Lee, Yu-Ting Tsai","doi":"10.1109/ICDIPC.2015.7323006","DOIUrl":null,"url":null,"abstract":"Since most of web pages contain their main information in data records, extracting data records enables one to obtain and integrate data from diverse sources of Internet. Therefore, data extraction of web pages has been a popular research issue in the last decade. The paper aims to automatically extract data records from web pages and identify items from those extracted records. The proposed approach utilizes Data Path Matching to effectively extract data records and Data Path Code Alignment to efficiently identify data items. Experimental results reveal that the method can extract data effectively.","PeriodicalId":339685,"journal":{"name":"2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIPC.2015.7323006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Since most of web pages contain their main information in data records, extracting data records enables one to obtain and integrate data from diverse sources of Internet. Therefore, data extraction of web pages has been a popular research issue in the last decade. The paper aims to automatically extract data records from web pages and identify items from those extracted records. The proposed approach utilizes Data Path Matching to effectively extract data records and Data Path Code Alignment to efficiently identify data items. Experimental results reveal that the method can extract data effectively.