{"title":"用于AJAX应用程序的分布式基于组件的爬虫","authors":"Suryanshu Raj, R. Krishna, A. Nayak","doi":"10.1109/ICAECC.2018.8479454","DOIUrl":null,"url":null,"abstract":"Crawling web applications is important for indexing websites as well as for testing vulnerabilities present in the website. The research area of crawling traditional websites has made significant progress and many software suites are available which can carry deep crawls of large traditional websites in limited time. The modern AJAX (asynchronous JavaScript and XML) based websites, however, cannot be crawled by traditional crawlers. The area is open to research and many open-source software suites are being developed. However, the software suites developed so far still face the issues of state space explosion, poor time efficiency and incomplete content coverage. This research work aims to develop a distributed component-based crawler for deterministic AJAX applications to reduce state space explosion, improve time efficiency and provide complete content coverage. It uses a combination of multiple approaches to develop the solution. Firstly, it takes into account a Component-Based approach to reduce state space explosion. It then takes a Distributed-Crawling approach to process the events concurrently in order to improve efficiency. It employs a Breadth First Search (BFS) strategy to provide complete content coverage.","PeriodicalId":106991,"journal":{"name":"2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Distributed Component-Based Crawler for AJAX Applications\",\"authors\":\"Suryanshu Raj, R. Krishna, A. Nayak\",\"doi\":\"10.1109/ICAECC.2018.8479454\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Crawling web applications is important for indexing websites as well as for testing vulnerabilities present in the website. The research area of crawling traditional websites has made significant progress and many software suites are available which can carry deep crawls of large traditional websites in limited time. The modern AJAX (asynchronous JavaScript and XML) based websites, however, cannot be crawled by traditional crawlers. The area is open to research and many open-source software suites are being developed. However, the software suites developed so far still face the issues of state space explosion, poor time efficiency and incomplete content coverage. This research work aims to develop a distributed component-based crawler for deterministic AJAX applications to reduce state space explosion, improve time efficiency and provide complete content coverage. It uses a combination of multiple approaches to develop the solution. Firstly, it takes into account a Component-Based approach to reduce state space explosion. It then takes a Distributed-Crawling approach to process the events concurrently in order to improve efficiency. It employs a Breadth First Search (BFS) strategy to provide complete content coverage.\",\"PeriodicalId\":106991,\"journal\":{\"name\":\"2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAECC.2018.8479454\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECC.2018.8479454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed Component-Based Crawler for AJAX Applications
Crawling web applications is important for indexing websites as well as for testing vulnerabilities present in the website. The research area of crawling traditional websites has made significant progress and many software suites are available which can carry deep crawls of large traditional websites in limited time. The modern AJAX (asynchronous JavaScript and XML) based websites, however, cannot be crawled by traditional crawlers. The area is open to research and many open-source software suites are being developed. However, the software suites developed so far still face the issues of state space explosion, poor time efficiency and incomplete content coverage. This research work aims to develop a distributed component-based crawler for deterministic AJAX applications to reduce state space explosion, improve time efficiency and provide complete content coverage. It uses a combination of multiple approaches to develop the solution. Firstly, it takes into account a Component-Based approach to reduce state space explosion. It then takes a Distributed-Crawling approach to process the events concurrently in order to improve efficiency. It employs a Breadth First Search (BFS) strategy to provide complete content coverage.