Sebastian Müller, Jan Arne Sparka, Martin Kuban, Claudia Draxl, Lars Grunske
{"title":"Grammar‐based fuzzing of data integration parsers in computational materials science","authors":"Sebastian Müller, Jan Arne Sparka, Martin Kuban, Claudia Draxl, Lars Grunske","doi":"10.1002/spe.3266","DOIUrl":null,"url":null,"abstract":"Computational materials science (CMS) focuses on in silico experiments to compute the properties of known and novel materials, where many software packages are used in the community. The NOMAD Laboratory (Draxl C, Scheffler) offers to store the input and output files in its FAIR data repository. Since the file formats of these software packages are non‐standardized, parsers are used to provide the results in a normalized format.The main goal of this article is to report experience and findings of using grammar‐based fuzzing on these parsers.We have constructed an input grammar for four common software packages in the CMS domain and performed an experimental evaluation on the capabilities of grammar‐based fuzzing to detect failures in the Novel Materials Discovery (NOMAD) parsers.With our approach, we were able to identify three unique critical bugs concerning service availability, as well as several additional syntactic, semantic, logical, and downstream bugs in the investigated NOMAD parsers. We reported all issues to the developer team prior to publication.Based on the experience gained, we can recommend grammar‐based fuzzing also for other research software packages to improve the trust level in the correctness of the produced results.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Computational materials science (CMS) focuses on in silico experiments to compute the properties of known and novel materials, where many software packages are used in the community. The NOMAD Laboratory (Draxl C, Scheffler) offers to store the input and output files in its FAIR data repository. Since the file formats of these software packages are non‐standardized, parsers are used to provide the results in a normalized format.The main goal of this article is to report experience and findings of using grammar‐based fuzzing on these parsers.We have constructed an input grammar for four common software packages in the CMS domain and performed an experimental evaluation on the capabilities of grammar‐based fuzzing to detect failures in the Novel Materials Discovery (NOMAD) parsers.With our approach, we were able to identify three unique critical bugs concerning service availability, as well as several additional syntactic, semantic, logical, and downstream bugs in the investigated NOMAD parsers. We reported all issues to the developer team prior to publication.Based on the experience gained, we can recommend grammar‐based fuzzing also for other research software packages to improve the trust level in the correctness of the produced results.