{"title":"Integration of HTML tables in web pages","authors":"Memen Akbar, F. N. Azizah, G. A. Putri Saptawati","doi":"10.1109/ICODSE.2015.7436985","DOIUrl":null,"url":null,"abstract":"The growing number of Web pages on the Internet introduces a need to combine and integrate information from HTML tables of different Web pages that contain similar information into a single Web page, especially information from the same domain of interest. This paper presents an approach of HTML table integration by combining several existing methods that are proved to solve different issues in the integration processes. The integration of HTML table consists of three phases: (1) extraction of the structure of the tables; (2) integration of the tables' schema; (3) integration of the data values. To solve the conflicts in semantics and naming in the tables schema, domain-ontology is used. To improve quality of integration of data values in the tables, the vector space model is used to check the duplications of data values. As the integration result, a single HTML table is obtained. The approach is implemented on an engine built using Phyton. Results of the experiment shows that the engine can successfully integrate two HTML tables into single table.","PeriodicalId":374006,"journal":{"name":"2015 International Conference on Data and Software Engineering (ICoDSE)","volume":" 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Data and Software Engineering (ICoDSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICODSE.2015.7436985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The growing number of Web pages on the Internet introduces a need to combine and integrate information from HTML tables of different Web pages that contain similar information into a single Web page, especially information from the same domain of interest. This paper presents an approach of HTML table integration by combining several existing methods that are proved to solve different issues in the integration processes. The integration of HTML table consists of three phases: (1) extraction of the structure of the tables; (2) integration of the tables' schema; (3) integration of the data values. To solve the conflicts in semantics and naming in the tables schema, domain-ontology is used. To improve quality of integration of data values in the tables, the vector space model is used to check the duplications of data values. As the integration result, a single HTML table is obtained. The approach is implemented on an engine built using Phyton. Results of the experiment shows that the engine can successfully integrate two HTML tables into single table.