小语言基于语法的测试:学生编译器的经验报告

Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering Pub Date : 2020-11-15 DOI:10.1145/3426425.3426946

P. van Heerden, Moeketsi Raselimo, Konstantinos Sagonas, B. Fischer

{"title":"小语言基于语法的测试:学生编译器的经验报告","authors":"P. van Heerden, Moeketsi Raselimo, Konstantinos Sagonas, B. Fischer","doi":"10.1145/3426425.3426946","DOIUrl":null,"url":null,"abstract":"We report on our experience in using various grammar-based test suite generation methods to test 61 single-pass compilers that undergraduate students submitted for the practical project of a computer architecture course. We show that (1) all test suites constructed systematically following different grammar coverage criteria fall far behind the instructor's test suite in achieved code coverage, in the number of triggered semantic errors, and in detected failures and crashes; (2) a medium-sized positive random test suite triggers more crashes than the instructor's test suite, but achieves lower code coverage and triggers fewer non-crashing errors; and (3) a combination of the systematic and random test suites performs as well or better than the instructor's test suite in all aspects and identifies errors or crashes in every single submission. We then develop a light-weight extension of the basic grammar-based testing framework to capture contextual constraints, by encoding scoping and typing information as ``semantic mark-up tokens'' in the grammar rules. These mark-up tokens are interpreted by a small generic core engine when the tests are rendered, and tests with a syntactic structure that cannot be completed into a valid program by choosing appropriate identifiers are discarded. % We formalize individual error models by overwriting individual mark-up tokens, and generate tests that are guaranteed to break specific contextual properties of the language. We show that a fully automatically generated random test suite with 15 error models achieves roughly the same coverage as the instructor's test suite, and outperforms it in the number of triggered semantic errors and detected failures and crashes. Moreover, all failing tests indicate real errors, and we have detected errors even in the instructor's reference implementation.","PeriodicalId":312792,"journal":{"name":"Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Grammar-based testing for little languages: an experience report with student compilers\",\"authors\":\"P. van Heerden, Moeketsi Raselimo, Konstantinos Sagonas, B. Fischer\",\"doi\":\"10.1145/3426425.3426946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We report on our experience in using various grammar-based test suite generation methods to test 61 single-pass compilers that undergraduate students submitted for the practical project of a computer architecture course. We show that (1) all test suites constructed systematically following different grammar coverage criteria fall far behind the instructor's test suite in achieved code coverage, in the number of triggered semantic errors, and in detected failures and crashes; (2) a medium-sized positive random test suite triggers more crashes than the instructor's test suite, but achieves lower code coverage and triggers fewer non-crashing errors; and (3) a combination of the systematic and random test suites performs as well or better than the instructor's test suite in all aspects and identifies errors or crashes in every single submission. We then develop a light-weight extension of the basic grammar-based testing framework to capture contextual constraints, by encoding scoping and typing information as ``semantic mark-up tokens'' in the grammar rules. These mark-up tokens are interpreted by a small generic core engine when the tests are rendered, and tests with a syntactic structure that cannot be completed into a valid program by choosing appropriate identifiers are discarded. % We formalize individual error models by overwriting individual mark-up tokens, and generate tests that are guaranteed to break specific contextual properties of the language. We show that a fully automatically generated random test suite with 15 error models achieves roughly the same coverage as the instructor's test suite, and outperforms it in the number of triggered semantic errors and detected failures and crashes. Moreover, all failing tests indicate real errors, and we have detected errors even in the instructor's reference implementation.\",\"PeriodicalId\":312792,\"journal\":{\"name\":\"Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3426425.3426946\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3426425.3426946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

我们报告了我们使用各种基于语法的测试套件生成方法来测试61个单次通过的编译器的经验，这些编译器是本科生为计算机体系结构课程的实际项目提交的。我们表明:(1)在实现的代码覆盖率、触发的语义错误的数量以及检测到的失败和崩溃方面，所有按照不同语法覆盖标准系统构建的测试套件都远远落后于教师的测试套件;(2)一个中等大小的正随机测试套件比教师的测试套件触发更多的崩溃，但实现更低的代码覆盖率和触发更少的非崩溃错误;(3)系统和随机测试套件的组合在所有方面都表现得与教师的测试套件一样好，甚至更好，并且可以识别每次提交的错误或崩溃。然后，我们开发基于基本语法的测试框架的轻量级扩展，通过在语法规则中编码范围并将信息键入为“语义标记令牌”来捕获上下文约束。在呈现测试时，这些标记令牌由一个小型的通用核心引擎解释，并且具有不能通过选择适当的标识符完成为有效程序的语法结构的测试将被丢弃。我们通过覆盖单个标记符号来形式化单个错误模型，并生成保证会破坏语言特定上下文属性的测试。我们展示了一个具有15个错误模型的完全自动生成的随机测试套件，其覆盖率与教师的测试套件大致相同，并且在触发的语义错误和检测到的失败和崩溃的数量上优于教师的测试套件。此外，所有失败的测试都表明存在真正的错误，甚至在讲师的参考实现中我们也发现了错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Grammar-based testing for little languages: an experience report with student compilers

We report on our experience in using various grammar-based test suite generation methods to test 61 single-pass compilers that undergraduate students submitted for the practical project of a computer architecture course. We show that (1) all test suites constructed systematically following different grammar coverage criteria fall far behind the instructor's test suite in achieved code coverage, in the number of triggered semantic errors, and in detected failures and crashes; (2) a medium-sized positive random test suite triggers more crashes than the instructor's test suite, but achieves lower code coverage and triggers fewer non-crashing errors; and (3) a combination of the systematic and random test suites performs as well or better than the instructor's test suite in all aspects and identifies errors or crashes in every single submission. We then develop a light-weight extension of the basic grammar-based testing framework to capture contextual constraints, by encoding scoping and typing information as ``semantic mark-up tokens'' in the grammar rules. These mark-up tokens are interpreted by a small generic core engine when the tests are rendered, and tests with a syntactic structure that cannot be completed into a valid program by choosing appropriate identifiers are discarded. % We formalize individual error models by overwriting individual mark-up tokens, and generate tests that are guaranteed to break specific contextual properties of the language. We show that a fully automatically generated random test suite with 15 error models achieves roughly the same coverage as the instructor's test suite, and outperforms it in the number of triggered semantic errors and detected failures and crashes. Moreover, all failing tests indicate real errors, and we have detected errors even in the instructor's reference implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering

自引率

0.00%

发文量