{"title":"基于学习多智能体系统的深度网络资源库定位爬虫的体系结构框架","authors":"Akilandeswari Jeyapal, N. Gopalan","doi":"10.1109/ICIW.2008.94","DOIUrl":null,"url":null,"abstract":"The World Wide Web (WWW) has become one of the largest and most readily accessible repositories of human knowledge. The traditional search engines index only surface Web whose pages are easily found. The focus has now been moved to invisible Web or hidden Web, which consists of large warehouse of useful data such as images, sounds, presentations and many other types of media. To utilize such data, there is a need for specialized program to locate those sites as we do with search engines. This paper discusses about an effective design of a hidden Web crawler that can autonomously discover pages from the hidden Web by employing multi-agent Web mining system. A theoretical framework is suggested to investigate the resource discovery problem and the empirical results suggest substantial improvement in the crawling strategy and harvest rate.","PeriodicalId":139145,"journal":{"name":"2008 Third International Conference on Internet and Web Applications and Services","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An Architectural Framework of a Crawler for Locating Deep Web Repositories Using Learning Multi-agent Systems\",\"authors\":\"Akilandeswari Jeyapal, N. Gopalan\",\"doi\":\"10.1109/ICIW.2008.94\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The World Wide Web (WWW) has become one of the largest and most readily accessible repositories of human knowledge. The traditional search engines index only surface Web whose pages are easily found. The focus has now been moved to invisible Web or hidden Web, which consists of large warehouse of useful data such as images, sounds, presentations and many other types of media. To utilize such data, there is a need for specialized program to locate those sites as we do with search engines. This paper discusses about an effective design of a hidden Web crawler that can autonomously discover pages from the hidden Web by employing multi-agent Web mining system. A theoretical framework is suggested to investigate the resource discovery problem and the empirical results suggest substantial improvement in the crawling strategy and harvest rate.\",\"PeriodicalId\":139145,\"journal\":{\"name\":\"2008 Third International Conference on Internet and Web Applications and Services\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Third International Conference on Internet and Web Applications and Services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIW.2008.94\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Third International Conference on Internet and Web Applications and Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIW.2008.94","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Architectural Framework of a Crawler for Locating Deep Web Repositories Using Learning Multi-agent Systems
The World Wide Web (WWW) has become one of the largest and most readily accessible repositories of human knowledge. The traditional search engines index only surface Web whose pages are easily found. The focus has now been moved to invisible Web or hidden Web, which consists of large warehouse of useful data such as images, sounds, presentations and many other types of media. To utilize such data, there is a need for specialized program to locate those sites as we do with search engines. This paper discusses about an effective design of a hidden Web crawler that can autonomously discover pages from the hidden Web by employing multi-agent Web mining system. A theoretical framework is suggested to investigate the resource discovery problem and the empirical results suggest substantial improvement in the crawling strategy and harvest rate.