使用强化规划生成三种二进制加法算法

ACM SE '10 Pub Date : 2010-04-15 DOI:10.1145/1900008.1900072

S. White, T. Martinez, G. Rudolph

{"title":"使用强化规划生成三种二进制加法算法","authors":"S. White, T. Martinez, G. Rudolph","doi":"10.1145/1900008.1900072","DOIUrl":null,"url":null,"abstract":"Reinforcement Programming (RP) is a new technique for automatically generating a computer program using reinforcement learning methods. This paper describes how RP learned to generate code for three binary addition problems: simulate a full adder circuit, increment a binary number, and add two binary numbers. Each problem is presented as an extension of the one previous to it, which provides an introduction to the practical application of RP. Each solution uses a dynamic, episodic form of delayed Q-Learning algorithm. \"Dynamic\" means that grows the policy during learning, and prunes it before the policy is translated to source code. This is different from Q-Learning models that use fixed-size tables or neural net function approximators to store q-values associated with (state, action) pairs. The states, actions, rewards, other parameters, and results of experiments are presented for each of the three problems.","PeriodicalId":333104,"journal":{"name":"ACM SE '10","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Generating three binary addition algorithms using reinforcement programming\",\"authors\":\"S. White, T. Martinez, G. Rudolph\",\"doi\":\"10.1145/1900008.1900072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement Programming (RP) is a new technique for automatically generating a computer program using reinforcement learning methods. This paper describes how RP learned to generate code for three binary addition problems: simulate a full adder circuit, increment a binary number, and add two binary numbers. Each problem is presented as an extension of the one previous to it, which provides an introduction to the practical application of RP. Each solution uses a dynamic, episodic form of delayed Q-Learning algorithm. \\\"Dynamic\\\" means that grows the policy during learning, and prunes it before the policy is translated to source code. This is different from Q-Learning models that use fixed-size tables or neural net function approximators to store q-values associated with (state, action) pairs. The states, actions, rewards, other parameters, and results of experiments are presented for each of the three problems.\",\"PeriodicalId\":333104,\"journal\":{\"name\":\"ACM SE '10\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM SE '10\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1900008.1900072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SE '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1900008.1900072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

强化编程(RP)是一种利用强化学习方法自动生成计算机程序的新技术。本文描述了RP如何学习生成三个二进制加法问题的代码:模拟一个完整的加法器电路，增加一个二进制数，并增加两个二进制数。每个问题都是前一个问题的扩展，它提供了RP的实际应用的介绍。每个解决方案都使用一种动态的、情景形式的延迟Q-Learning算法。“动态”意味着在学习过程中对策略进行生长，并在将策略转换为源代码之前对其进行修剪。这与使用固定大小的表或神经网络函数逼近器来存储与(状态，动作)对相关的q值的Q-Learning模型不同。给出了这三个问题的状态、行为、奖励、其他参数和实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generating three binary addition algorithms using reinforcement programming

Reinforcement Programming (RP) is a new technique for automatically generating a computer program using reinforcement learning methods. This paper describes how RP learned to generate code for three binary addition problems: simulate a full adder circuit, increment a binary number, and add two binary numbers. Each problem is presented as an extension of the one previous to it, which provides an introduction to the practical application of RP. Each solution uses a dynamic, episodic form of delayed Q-Learning algorithm. "Dynamic" means that grows the policy during learning, and prunes it before the policy is translated to source code. This is different from Q-Learning models that use fixed-size tables or neural net function approximators to store q-values associated with (state, action) pairs. The states, actions, rewards, other parameters, and results of experiments are presented for each of the three problems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM SE '10

自引率

0.00%

发文量