小语言基于语法的测试:学生编译器的经验报告

P. van Heerden, Moeketsi Raselimo, Konstantinos Sagonas, B. Fischer
{"title":"小语言基于语法的测试:学生编译器的经验报告","authors":"P. van Heerden, Moeketsi Raselimo, Konstantinos Sagonas, B. Fischer","doi":"10.1145/3426425.3426946","DOIUrl":null,"url":null,"abstract":"We report on our experience in using various grammar-based test suite generation methods to test 61 single-pass compilers that undergraduate students submitted for the practical project of a computer architecture course. We show that (1) all test suites constructed systematically following different grammar coverage criteria fall far behind the instructor's test suite in achieved code coverage, in the number of triggered semantic errors, and in detected failures and crashes; (2) a medium-sized positive random test suite triggers more crashes than the instructor's test suite, but achieves lower code coverage and triggers fewer non-crashing errors; and (3) a combination of the systematic and random test suites performs as well or better than the instructor's test suite in all aspects and identifies errors or crashes in every single submission. We then develop a light-weight extension of the basic grammar-based testing framework to capture contextual constraints, by encoding scoping and typing information as ``semantic mark-up tokens'' in the grammar rules. These mark-up tokens are interpreted by a small generic core engine when the tests are rendered, and tests with a syntactic structure that cannot be completed into a valid program by choosing appropriate identifiers are discarded. % We formalize individual error models by overwriting individual mark-up tokens, and generate tests that are guaranteed to break specific contextual properties of the language. We show that a fully automatically generated random test suite with 15 error models achieves roughly the same coverage as the instructor's test suite, and outperforms it in the number of triggered semantic errors and detected failures and crashes. Moreover, all failing tests indicate real errors, and we have detected errors even in the instructor's reference implementation.","PeriodicalId":312792,"journal":{"name":"Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Grammar-based testing for little languages: an experience report with student compilers\",\"authors\":\"P. van Heerden, Moeketsi Raselimo, Konstantinos Sagonas, B. Fischer\",\"doi\":\"10.1145/3426425.3426946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We report on our experience in using various grammar-based test suite generation methods to test 61 single-pass compilers that undergraduate students submitted for the practical project of a computer architecture course. We show that (1) all test suites constructed systematically following different grammar coverage criteria fall far behind the instructor's test suite in achieved code coverage, in the number of triggered semantic errors, and in detected failures and crashes; (2) a medium-sized positive random test suite triggers more crashes than the instructor's test suite, but achieves lower code coverage and triggers fewer non-crashing errors; and (3) a combination of the systematic and random test suites performs as well or better than the instructor's test suite in all aspects and identifies errors or crashes in every single submission. We then develop a light-weight extension of the basic grammar-based testing framework to capture contextual constraints, by encoding scoping and typing information as ``semantic mark-up tokens'' in the grammar rules. These mark-up tokens are interpreted by a small generic core engine when the tests are rendered, and tests with a syntactic structure that cannot be completed into a valid program by choosing appropriate identifiers are discarded. % We formalize individual error models by overwriting individual mark-up tokens, and generate tests that are guaranteed to break specific contextual properties of the language. We show that a fully automatically generated random test suite with 15 error models achieves roughly the same coverage as the instructor's test suite, and outperforms it in the number of triggered semantic errors and detected failures and crashes. Moreover, all failing tests indicate real errors, and we have detected errors even in the instructor's reference implementation.\",\"PeriodicalId\":312792,\"journal\":{\"name\":\"Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3426425.3426946\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM SIGPLAN International Conference on Software Language Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3426425.3426946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

我们报告了我们使用各种基于语法的测试套件生成方法来测试61个单次通过的编译器的经验,这些编译器是本科生为计算机体系结构课程的实际项目提交的。我们表明:(1)在实现的代码覆盖率、触发的语义错误的数量以及检测到的失败和崩溃方面,所有按照不同语法覆盖标准系统构建的测试套件都远远落后于教师的测试套件;(2)一个中等大小的正随机测试套件比教师的测试套件触发更多的崩溃,但实现更低的代码覆盖率和触发更少的非崩溃错误;(3)系统和随机测试套件的组合在所有方面都表现得与教师的测试套件一样好,甚至更好,并且可以识别每次提交的错误或崩溃。然后,我们开发基于基本语法的测试框架的轻量级扩展,通过在语法规则中编码范围并将信息键入为“语义标记令牌”来捕获上下文约束。在呈现测试时,这些标记令牌由一个小型的通用核心引擎解释,并且具有不能通过选择适当的标识符完成为有效程序的语法结构的测试将被丢弃。我们通过覆盖单个标记符号来形式化单个错误模型,并生成保证会破坏语言特定上下文属性的测试。我们展示了一个具有15个错误模型的完全自动生成的随机测试套件,其覆盖率与教师的测试套件大致相同,并且在触发的语义错误和检测到的失败和崩溃的数量上优于教师的测试套件。此外,所有失败的测试都表明存在真正的错误,甚至在讲师的参考实现中我们也发现了错误。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Grammar-based testing for little languages: an experience report with student compilers
We report on our experience in using various grammar-based test suite generation methods to test 61 single-pass compilers that undergraduate students submitted for the practical project of a computer architecture course. We show that (1) all test suites constructed systematically following different grammar coverage criteria fall far behind the instructor's test suite in achieved code coverage, in the number of triggered semantic errors, and in detected failures and crashes; (2) a medium-sized positive random test suite triggers more crashes than the instructor's test suite, but achieves lower code coverage and triggers fewer non-crashing errors; and (3) a combination of the systematic and random test suites performs as well or better than the instructor's test suite in all aspects and identifies errors or crashes in every single submission. We then develop a light-weight extension of the basic grammar-based testing framework to capture contextual constraints, by encoding scoping and typing information as ``semantic mark-up tokens'' in the grammar rules. These mark-up tokens are interpreted by a small generic core engine when the tests are rendered, and tests with a syntactic structure that cannot be completed into a valid program by choosing appropriate identifiers are discarded. % We formalize individual error models by overwriting individual mark-up tokens, and generate tests that are guaranteed to break specific contextual properties of the language. We show that a fully automatically generated random test suite with 15 error models achieves roughly the same coverage as the instructor's test suite, and outperforms it in the number of triggered semantic errors and detected failures and crashes. Moreover, all failing tests indicate real errors, and we have detected errors even in the instructor's reference implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信