{"title":"Automatic extraction of Web search interface based on visual features","authors":"Yulu Zhang, Jing Qiao","doi":"10.1109/ICIEA.2008.4582925","DOIUrl":null,"url":null,"abstract":"Ordinarily, a Web query interface can be considered as an interface schema containing multiple attributes and rich semantic/meta information, however, the schema is not formally defined in HTML. We observed that most Web pages have many visual cues to help distinguish different parts of the page. In this paper, we propose a novel approach to solve the extraction problem of query interface. Firstly, we propose a schema model for representing complex search interfaces. Secondly, we present an approach based on visual feature to automatically extract the search interfaces from Web page. It simulates how a user understands Web search interface based on its visual perception. Our experimental results indicate that the visual feature approach can work significantly better than the baselines in search interface extraction and achieve very high extraction accuracy.","PeriodicalId":309894,"journal":{"name":"2008 3rd IEEE Conference on Industrial Electronics and Applications","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 3rd IEEE Conference on Industrial Electronics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIEA.2008.4582925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Ordinarily, a Web query interface can be considered as an interface schema containing multiple attributes and rich semantic/meta information, however, the schema is not formally defined in HTML. We observed that most Web pages have many visual cues to help distinguish different parts of the page. In this paper, we propose a novel approach to solve the extraction problem of query interface. Firstly, we propose a schema model for representing complex search interfaces. Secondly, we present an approach based on visual feature to automatically extract the search interfaces from Web page. It simulates how a user understands Web search interface based on its visual perception. Our experimental results indicate that the visual feature approach can work significantly better than the baselines in search interface extraction and achieve very high extraction accuracy.