{"title":"Research on Automatic Classification for Deep Web Query Interfaces","authors":"Peiguang Lin, Y. Du, Xiaohua Tan, Chao Lv","doi":"10.1109/ISIP.2008.140","DOIUrl":null,"url":null,"abstract":"In recent years, the Web is \"deepened\" rapidly and users have to browse quantities of Web sites to access Web databases in a specific domain. So, to build an unified query interface which integrates query interfaces of a domain to access various Web databases at the same time becomes a very important issue. In this paper, the schema characteristics of query interfaces and common attributes in a same domain are firstly analyzed, and it also gives a new representation of query interface, then the definition of \"Form term\" and \"Function term\" are proposed ,and a new similarity computing algorithm, literal and semantic based similarity computing (LSSC) is proposed, which is based on the two definitions. Secondly, a clustering algorithm for Deep Web query interfaces is given by combining LSSC and NQ algorithm: LSSC-NQ. Finally, experiments show that this algorithm can give accurate similarity computing, and cluster query interfaces efficiently, reliably and quickly.","PeriodicalId":103284,"journal":{"name":"2008 International Symposiums on Information Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Symposiums on Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIP.2008.140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
In recent years, the Web is "deepened" rapidly and users have to browse quantities of Web sites to access Web databases in a specific domain. So, to build an unified query interface which integrates query interfaces of a domain to access various Web databases at the same time becomes a very important issue. In this paper, the schema characteristics of query interfaces and common attributes in a same domain are firstly analyzed, and it also gives a new representation of query interface, then the definition of "Form term" and "Function term" are proposed ,and a new similarity computing algorithm, literal and semantic based similarity computing (LSSC) is proposed, which is based on the two definitions. Secondly, a clustering algorithm for Deep Web query interfaces is given by combining LSSC and NQ algorithm: LSSC-NQ. Finally, experiments show that this algorithm can give accurate similarity computing, and cluster query interfaces efficiently, reliably and quickly.