通过动态符号执行补充机器学习分类器:“人类与机器人生成”的推文

2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE) Pub Date : 2018-05-28 DOI:10.1145/3194104.3194111

S. L. Shrestha, Saroj Panda, Christoph Csallner

{"title":"通过动态符号执行补充机器学习分类器:“人类与机器人生成”的推文","authors":"S. L. Shrestha, Saroj Panda, Christoph Csallner","doi":"10.1145/3194104.3194111","DOIUrl":null,"url":null,"abstract":"Recent machine learning approaches for classifying text as human-written or bot-generated rely on training sets that are large, labeled diligently, and representative of the underlying domain. While valuable, these machine learning approaches ignore programs as an additional source of such training sets. To address this problem of incomplete training sets, this paper proposes to systematically supplement existing training sets with samples inferred via program analysis. In our preliminary evaluation, training sets enriched with samples inferred via dynamic symbolic execution were able to improve machine learning classifier accuracy for simple string-generating programs.","PeriodicalId":249268,"journal":{"name":"2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Complementing Machine Learning Classifiers via Dynamic Symbolic Execution: \\\"Human vs. Bot Generated\\\" Tweets\",\"authors\":\"S. L. Shrestha, Saroj Panda, Christoph Csallner\",\"doi\":\"10.1145/3194104.3194111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent machine learning approaches for classifying text as human-written or bot-generated rely on training sets that are large, labeled diligently, and representative of the underlying domain. While valuable, these machine learning approaches ignore programs as an additional source of such training sets. To address this problem of incomplete training sets, this paper proposes to systematically supplement existing training sets with samples inferred via program analysis. In our preliminary evaluation, training sets enriched with samples inferred via dynamic symbolic execution were able to improve machine learning classifier accuracy for simple string-generating programs.\",\"PeriodicalId\":249268,\"journal\":{\"name\":\"2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3194104.3194111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3194104.3194111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

最近用于将文本分类为人类编写或机器人生成的机器学习方法依赖于大型、勤奋标记并代表底层领域的训练集。虽然有价值，但这些机器学习方法忽略了程序作为这种训练集的额外来源。为了解决训练集不完整的问题，本文提出用程序分析推断的样本系统地补充现有的训练集。在我们的初步评估中，通过动态符号执行推断的样本丰富的训练集能够提高简单字符串生成程序的机器学习分类器的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Complementing Machine Learning Classifiers via Dynamic Symbolic Execution: "Human vs. Bot Generated" Tweets

Recent machine learning approaches for classifying text as human-written or bot-generated rely on training sets that are large, labeled diligently, and representative of the underlying domain. While valuable, these machine learning approaches ignore programs as an additional source of such training sets. To address this problem of incomplete training sets, this paper proposes to systematically supplement existing training sets with samples inferred via program analysis. In our preliminary evaluation, training sets enriched with samples inferred via dynamic symbolic execution were able to improve machine learning classifier accuracy for simple string-generating programs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)

自引率

0.00%

发文量