基于学习的软件文档编码最佳实践识别

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2022-10-01 DOI:10.1109/ICSME55016.2022.00073

Neela Sawant, Srinivasan H. Sengamedu

{"title":"基于学习的软件文档编码最佳实践识别","authors":"Neela Sawant, Srinivasan H. Sengamedu","doi":"10.1109/ICSME55016.2022.00073","DOIUrl":null,"url":null,"abstract":"Automatic identification of coding best practices can scale the development of code and application analyzers. We present Doc2BP, a deep learning tool to identify coding best practices in software documentation. Natural language descriptions are mapped to an informative embedding space, optimized under the dual objectives of binary and few shot classification. The binary objective powers general classification into known best practice categories using a deep learning classifier. The few shot objective facilitates example-based classification into novel categories by matching embeddings with user-provided examples at run-time, without having to retrain the underlying model. We analyze the effects of manually and synthetically labeled examples, context, and cross-domain information.We have applied Doc2BP to Java, Python, AWS Java SDK, and AWS CloudFormation documentations. With respect to prior works that primarily leverage keyword heuristics and our own parts of speech pattern baselines, we obtain 3-5% F1 score improvement for Java and Python, and 15-20% for AWS Java SDK and AWS CloudFormation. Experiments with four few shot use-cases show promising results (5-shot accuracy of 99%+ for Java NullPointerException and AWS Java metrics, 65% for AWS CloudFormation numerics, and 35% for Python best practices).Doc2BP has contributed new rules and improved specifications in Amazon's code and application analyzers: (a) 500+ new checks in cfn-lint, an open-source AWS CloudFormation linter, (b) over 97% automated coverage of metrics APIs and related practices in Amazon DevOps Guru, (c) support for nullable AWS APIs in Amazon CodeGuru's Java NullPointerException (NPE) detector, (d) 200+ new best practices for Java, Python, and respective AWS SDKs in Amazon CodeGuru, and (e) 2% reduction in false positives in Amazon CodeGuru's Java resource leak detector.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Learning-based Identification of Coding Best Practices from Software Documentation\",\"authors\":\"Neela Sawant, Srinivasan H. Sengamedu\",\"doi\":\"10.1109/ICSME55016.2022.00073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic identification of coding best practices can scale the development of code and application analyzers. We present Doc2BP, a deep learning tool to identify coding best practices in software documentation. Natural language descriptions are mapped to an informative embedding space, optimized under the dual objectives of binary and few shot classification. The binary objective powers general classification into known best practice categories using a deep learning classifier. The few shot objective facilitates example-based classification into novel categories by matching embeddings with user-provided examples at run-time, without having to retrain the underlying model. We analyze the effects of manually and synthetically labeled examples, context, and cross-domain information.We have applied Doc2BP to Java, Python, AWS Java SDK, and AWS CloudFormation documentations. With respect to prior works that primarily leverage keyword heuristics and our own parts of speech pattern baselines, we obtain 3-5% F1 score improvement for Java and Python, and 15-20% for AWS Java SDK and AWS CloudFormation. Experiments with four few shot use-cases show promising results (5-shot accuracy of 99%+ for Java NullPointerException and AWS Java metrics, 65% for AWS CloudFormation numerics, and 35% for Python best practices).Doc2BP has contributed new rules and improved specifications in Amazon's code and application analyzers: (a) 500+ new checks in cfn-lint, an open-source AWS CloudFormation linter, (b) over 97% automated coverage of metrics APIs and related practices in Amazon DevOps Guru, (c) support for nullable AWS APIs in Amazon CodeGuru's Java NullPointerException (NPE) detector, (d) 200+ new best practices for Java, Python, and respective AWS SDKs in Amazon CodeGuru, and (e) 2% reduction in false positives in Amazon CodeGuru's Java resource leak detector.\",\"PeriodicalId\":300084,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSME55016.2022.00073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

编码最佳实践的自动识别可以扩展代码和应用程序分析器的开发。我们提出了Doc2BP，一个深度学习工具，用于识别软件文档中的编码最佳实践。将自然语言描述映射到信息嵌入空间，并在二元和少镜头分类的双重目标下进行优化。二元目标使用深度学习分类器将一般分类分为已知的最佳实践类别。通过在运行时将嵌入与用户提供的示例进行匹配，而无需重新训练底层模型，few shot objective将基于示例的分类简化为新的类别。我们分析了人工和综合标记的例子、上下文和跨域信息的效果。我们已经将Doc2BP应用于Java、Python、AWS Java SDK和AWS CloudFormation文档。对于之前主要利用关键字启发式和我们自己的词性模式基线的工作，我们在Java和Python上获得了3-5%的F1分数提高，在AWS Java SDK和AWS CloudFormation上获得了15-20%的分数提高。用四个少量用例进行的实验显示了有希望的结果(Java NullPointerException和AWS Java指标的5次准确率为99%以上，AWS CloudFormation数字的准确率为65%，Python最佳实践的准确率为35%)。Doc2BP为亚马逊的代码和应用程序分析器提供了新的规则和改进的规范:(a)开源AWS CloudFormation检测器cfn-lint中新增500多项检查，(b) Amazon DevOps Guru中指标api和相关实践的自动化覆盖率超过97%，(c) Amazon CodeGuru的Java NullPointerException (NPE)检测器中支持可空的AWS api， (d) Amazon CodeGuru中针对Java、Python和各自AWS sdk的200多项新最佳实践，以及(e) Amazon CodeGuru的Java资源泄漏检测器中误报率减少2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning-based Identification of Coding Best Practices from Software Documentation

Automatic identification of coding best practices can scale the development of code and application analyzers. We present Doc2BP, a deep learning tool to identify coding best practices in software documentation. Natural language descriptions are mapped to an informative embedding space, optimized under the dual objectives of binary and few shot classification. The binary objective powers general classification into known best practice categories using a deep learning classifier. The few shot objective facilitates example-based classification into novel categories by matching embeddings with user-provided examples at run-time, without having to retrain the underlying model. We analyze the effects of manually and synthetically labeled examples, context, and cross-domain information.We have applied Doc2BP to Java, Python, AWS Java SDK, and AWS CloudFormation documentations. With respect to prior works that primarily leverage keyword heuristics and our own parts of speech pattern baselines, we obtain 3-5% F1 score improvement for Java and Python, and 15-20% for AWS Java SDK and AWS CloudFormation. Experiments with four few shot use-cases show promising results (5-shot accuracy of 99%+ for Java NullPointerException and AWS Java metrics, 65% for AWS CloudFormation numerics, and 35% for Python best practices).Doc2BP has contributed new rules and improved specifications in Amazon's code and application analyzers: (a) 500+ new checks in cfn-lint, an open-source AWS CloudFormation linter, (b) over 97% automated coverage of metrics APIs and related practices in Amazon DevOps Guru, (c) support for nullable AWS APIs in Amazon CodeGuru's Java NullPointerException (NPE) detector, (d) 200+ new best practices for Java, Python, and respective AWS SDKs in Amazon CodeGuru, and (e) 2% reduction in false positives in Amazon CodeGuru's Java resource leak detector.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量