{"title":"语义相似度在基于爬虫的Web应用程序测试中的应用","authors":"Jun-Wei Lin, Farn Wang, Paul Chu","doi":"10.1109/ICST.2017.20","DOIUrl":null,"url":null,"abstract":"To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, their broad use is limited by the required manual configurations for input value selection, GUI state comparison and clickable detection. In existing crawlers, the configurations are usually string-matching based rules looking for tags or attributes of DOM elements, and often application-specific. Moreover, in input topic identification, it can be difficult to determine which rule suggests a better match when several rules match an input field to more than one topic. This paper presents a natural-language approach based on semantic similarity to address the above issues. The proposed approach represents DOM elements as vectors in a vector space formed by the words used in the elements. The topics of encountered input fields during crawling can then be inferred by their similarities with ones in a labeled corpus. Semantic similarity can also be applied to suggest if a GUI state is newly discovered and a DOM element is clickable under an unsupervised learning paradigm. We evaluated the proposed approach in input topic identification with 100 real-world forms and GUI state comparison with real data from industry. Our evaluation shows that the proposed approach has comparable or better performance to the conventional techniques. Experiments in input topic identification also show that the accuracy of the rule-based approach can be improved by up to 22% when integrated with our approach.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Using Semantic Similarity in Crawling-Based Web Application Testing\",\"authors\":\"Jun-Wei Lin, Farn Wang, Paul Chu\",\"doi\":\"10.1109/ICST.2017.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, their broad use is limited by the required manual configurations for input value selection, GUI state comparison and clickable detection. In existing crawlers, the configurations are usually string-matching based rules looking for tags or attributes of DOM elements, and often application-specific. Moreover, in input topic identification, it can be difficult to determine which rule suggests a better match when several rules match an input field to more than one topic. This paper presents a natural-language approach based on semantic similarity to address the above issues. The proposed approach represents DOM elements as vectors in a vector space formed by the words used in the elements. The topics of encountered input fields during crawling can then be inferred by their similarities with ones in a labeled corpus. Semantic similarity can also be applied to suggest if a GUI state is newly discovered and a DOM element is clickable under an unsupervised learning paradigm. We evaluated the proposed approach in input topic identification with 100 real-world forms and GUI state comparison with real data from industry. Our evaluation shows that the proposed approach has comparable or better performance to the conventional techniques. Experiments in input topic identification also show that the accuracy of the rule-based approach can be improved by up to 22% when integrated with our approach.\",\"PeriodicalId\":112258,\"journal\":{\"name\":\"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICST.2017.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICST.2017.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Semantic Similarity in Crawling-Based Web Application Testing
To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, their broad use is limited by the required manual configurations for input value selection, GUI state comparison and clickable detection. In existing crawlers, the configurations are usually string-matching based rules looking for tags or attributes of DOM elements, and often application-specific. Moreover, in input topic identification, it can be difficult to determine which rule suggests a better match when several rules match an input field to more than one topic. This paper presents a natural-language approach based on semantic similarity to address the above issues. The proposed approach represents DOM elements as vectors in a vector space formed by the words used in the elements. The topics of encountered input fields during crawling can then be inferred by their similarities with ones in a labeled corpus. Semantic similarity can also be applied to suggest if a GUI state is newly discovered and a DOM element is clickable under an unsupervised learning paradigm. We evaluated the proposed approach in input topic identification with 100 real-world forms and GUI state comparison with real data from industry. Our evaluation shows that the proposed approach has comparable or better performance to the conventional techniques. Experiments in input topic identification also show that the accuracy of the rule-based approach can be improved by up to 22% when integrated with our approach.