arXiv: Learning最新文献_第2页

Compositional Transfer in Hierarchical Reinforcement Learning 分层强化学习中的组合迁移

arXiv: Learning Pub Date : 2019-06-26 DOI: 10.15607/rss.2020.xvi.054

Markus Wulfmeier, A. Abdolmaleki, Roland Hafner, J. T. Springenberg, Michael Neunert, Tim Hertweck, T. Lampe, Noah Siegel, N. Heess, Martin A. Riedmiller

{"title":"Compositional Transfer in Hierarchical Reinforcement Learning","authors":"Markus Wulfmeier, A. Abdolmaleki, Roland Hafner, J. T. Springenberg, Michael Neunert, Tim Hertweck, T. Lampe, Noah Siegel, N. Heess, Martin A. Riedmiller","doi":"10.15607/rss.2020.xvi.054","DOIUrl":"https://doi.org/10.15607/rss.2020.xvi.054","url":null,"abstract":"The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multiple levels and corresponding mechanisms for sharing off-policy transition data across low-level controllers and tasks as well as scheduling of tasks. The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case. We show that the investigated types of hierarchy enable positive transfer while partially mitigating negative interference and evaluate the benefits of additional incentives for efficient, compositional task solutions in single task domains. Finally, we demonstrate substantial data-efficiency and final performance gains over competitive baselines in a week-long, physical robot stacking experiment.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"132 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85755598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Developing an ANFIS-PSO Based Model to Estimate Mercury Emission in Combustion Flue Gases 基于anfiss - pso模型估算燃烧烟气中汞排放

arXiv: Learning Pub Date : 2019-05-10 DOI: 10.20944/PREPRINTS201905.0124.V1

S. Shamshirband, A. Baghban, Masoud Hadipoor, A. Mosavi

引用次数: 1

Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces 参与者-专家:在连续动作空间中使用q -学习的框架

arXiv: Learning Pub Date : 2018-10-22 DOI: 10.7939/R3-QGDP-3872

Sungsu Lim

{"title":"Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces","authors":"Sungsu Lim","doi":"10.7939/R3-QGDP-3872","DOIUrl":"https://doi.org/10.7939/R3-QGDP-3872","url":null,"abstract":"Q-learning can be difficult to use in continuous action spaces, because an optimization has to be solved to find the maximal action for the action-values. A common strategy has been to restrict the functional form of the action-values to be concave in the actions, to simplify the optimization. Such restrictions, however, can prevent learning accurate action-values. In this work, we propose a new policy search objective that facilitates using Q-learning and a framework to optimize this objective, called Actor-Expert. The Expert uses Q-learning to update the action-values towards optimal action-values. The Actor learns the maximal actions over time for these changing action-values. We develop a Cross Entropy Method (CEM) for the Actor, where such a global optimization approach facilitates use of generically parameterized action-values. This method - which we call Conditional CEM - iteratively concentrates density around maximal actions, conditioned on state. We prove that this algorithm tracks the expected CEM update, over states with changing action-values. We demonstrate in a toy environment that previous methods that restrict the action-value parameterization fail whereas Actor-Expert with a more general action-value parameterization succeeds. Finally, we demonstrate that Actor-Expert performs as well as or better than competitors on four benchmark continuous-action environments.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76563496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Using link and content over time for embedding generation in Dynamic Attributed Networks 在动态属性网络中，使用随时间变化的链接和内容进行嵌入生成

arXiv: Learning Pub Date : 2018-07-17 DOI: 10.1007/978-3-030-10928-8_1

A. P. Appel, R. L. F. Cunha, C. Aggarwal, Marcela Megumi Terakado

引用次数: 20

Deep Learning on Low-Resource Datasets 低资源数据集上的深度学习

arXiv: Learning Pub Date : 2018-07-10 DOI: 10.20944/PREPRINTS201807.0185.V1

Veronica Morfi, D. Stowell

引用次数: 0

Multi-view Ensemble Classification for Clinically Actionable Genetic Mutations 临床可操作基因突变的多视图集成分类

arXiv: Learning Pub Date : 2018-06-26 DOI: 10.1007/978-3-319-94042-7_5

Xi Sheryl Zhang, Dandi Chen, Yongjun Zhu, Chao Che, Chang Su, Sendong Zhao, X. Min, Fei Wang

引用次数: 3

Reliable clustering of Bernoulli mixture models 伯努利混合模型的可靠聚类

arXiv: Learning Pub Date : 2017-10-05 DOI: 10.3150/19-bej1173

Amir Najafi, A. Motahari, H. Rabiee

引用次数: 8

Transitions, Losses, and Re-parameterizations: Elements of Prediction Games. 过渡、损失和重新参数化:预测游戏的元素

arXiv: Learning Pub Date : 2017-01-01 DOI: 10.25911/5D723BC67A01E

Kamalaruban Parameswaran

引用次数: 0

The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning 用于端到端低精度模型训练的ZipML框架:可以、不可以和一点深度学习

arXiv: Learning Pub Date : 2016-11-16 DOI: 10.3929/ethz-a-010890124

Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang

{"title":"The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning","authors":"Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang","doi":"10.3929/ethz-a-010890124","DOIUrl":"https://doi.org/10.3929/ethz-a-010890124","url":null,"abstract":"Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We present a framework called ZipML to answer these questions. For linear models, the answer is yes. We develop a simple framework based on one simple but novel strategy called double sampling. Our framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to 6.5x faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another 1.7x in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net. Finally, we extend our framework through approximation to non-linear models, such as SVM. We show that, although using low-precision data induces bias, we can appropriately bound and control the bias. We find in practice 8-bit precision is often sufficient to converge to the correct solution. Interestingly, however, in practice we notice that our framework does not always outperform the naive rounding approach. We discuss this negative result in detail.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89376724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Learning an Optimization Algorithm through Human Design Iterations 通过人类设计迭代学习优化算法

arXiv: Learning Pub Date : 2016-08-24 DOI: 10.1115/1.4037344

Thurston Sexton, Max Yi Ren

{"title":"Learning an Optimization Algorithm through Human Design Iterations","authors":"Thurston Sexton, Max Yi Ren","doi":"10.1115/1.4037344","DOIUrl":"https://doi.org/10.1115/1.4037344","url":null,"abstract":"Solving optimal design problems through crowdsourcing faces a dilemma: On one hand, human beings have been shown to be more effective than algorithms at searching for good solutions of certain real-world problems with high-dimensional or discrete solution spaces; on the other hand, the cost of setting up crowdsourcing environments, the uncertainty in the crowd's domain-specific competence, and the lack of commitment of the crowd, all contribute to the lack of real-world application of design crowdsourcing. We are thus motivated to investigate a solution-searching mechanism where an optimization algorithm is tuned based on human demonstrations on solution searching, so that the search can be continued after human participants abandon the problem. To do so, we model the iterative search process as a Bayesian Optimization (BO) algorithm, and propose an inverse BO (IBO) algorithm to find the maximum likelihood estimators of the BO parameters based on human solutions. We show through a vehicle design and control problem that the search performance of BO can be improved by recovering its parameters based on an effective human search. Thus, IBO has the potential to improve the success rate of design crowdsourcing activities, by requiring only good search strategies instead of good solutions from the crowd.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91048836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19