arXiv: LearningPub Date : 2019-06-26DOI: 10.15607/rss.2020.xvi.054
Markus Wulfmeier, A. Abdolmaleki, Roland Hafner, J. T. Springenberg, Michael Neunert, Tim Hertweck, T. Lampe, Noah Siegel, N. Heess, Martin A. Riedmiller
{"title":"Compositional Transfer in Hierarchical Reinforcement Learning","authors":"Markus Wulfmeier, A. Abdolmaleki, Roland Hafner, J. T. Springenberg, Michael Neunert, Tim Hertweck, T. Lampe, Noah Siegel, N. Heess, Martin A. Riedmiller","doi":"10.15607/rss.2020.xvi.054","DOIUrl":"https://doi.org/10.15607/rss.2020.xvi.054","url":null,"abstract":"The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multiple levels and corresponding mechanisms for sharing off-policy transition data across low-level controllers and tasks as well as scheduling of tasks. The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case. We show that the investigated types of hierarchy enable positive transfer while partially mitigating negative interference and evaluate the benefits of additional incentives for efficient, compositional task solutions in single task domains. Finally, we demonstrate substantial data-efficiency and final performance gains over competitive baselines in a week-long, physical robot stacking experiment.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"132 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85755598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
arXiv: LearningPub Date : 2019-05-10DOI: 10.20944/PREPRINTS201905.0124.V1
S. Shamshirband, A. Baghban, Masoud Hadipoor, A. Mosavi
{"title":"Developing an ANFIS-PSO Based Model to Estimate Mercury Emission in Combustion Flue Gases","authors":"S. Shamshirband, A. Baghban, Masoud Hadipoor, A. Mosavi","doi":"10.20944/PREPRINTS201905.0124.V1","DOIUrl":"https://doi.org/10.20944/PREPRINTS201905.0124.V1","url":null,"abstract":"Accurate prediction of mercury content emitted from fossil-fueled power stations is of utmost important to environmental pollution assessment and hazard mitigation. In this paper, mercury content in the output gas from boilers was predicted using an Adaptive Neuro-Fuzzy Inference System (ANFIS) integrated with particle swarm optimization (PSO). Input parameters were selected from coal characteristics and the operational configuration of boilers. The proposed ANFIS-PSO model is capable of developing a nonlinear model to represent the dependency of flue gas mercury content into the specifications of coal and also the boiler type. In this study, operational information from 82 power plants has been gathered and employed to educate and examine the proposed model. To evaluate the performance of the proposed model the statistical meter of MARE% was implemented, which resulted 0.003266 and 0.013272 for training and testing respectively. Furthermore, relative errors between acquired data and predicted values were between -0.25% and 0.1%, which confirm the accuracy of PSO-ANFIS model.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91260683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
arXiv: LearningPub Date : 2018-10-22DOI: 10.7939/R3-QGDP-3872
Sungsu Lim
{"title":"Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces","authors":"Sungsu Lim","doi":"10.7939/R3-QGDP-3872","DOIUrl":"https://doi.org/10.7939/R3-QGDP-3872","url":null,"abstract":"Q-learning can be difficult to use in continuous action spaces, because an optimization has to be solved to find the maximal action for the action-values. A common strategy has been to restrict the functional form of the action-values to be concave in the actions, to simplify the optimization. Such restrictions, however, can prevent learning accurate action-values. In this work, we propose a new policy search objective that facilitates using Q-learning and a framework to optimize this objective, called Actor-Expert. The Expert uses Q-learning to update the action-values towards optimal action-values. The Actor learns the maximal actions over time for these changing action-values. We develop a Cross Entropy Method (CEM) for the Actor, where such a global optimization approach facilitates use of generically parameterized action-values. This method - which we call Conditional CEM - iteratively concentrates density around maximal actions, conditioned on state. We prove that this algorithm tracks the expected CEM update, over states with changing action-values. We demonstrate in a toy environment that previous methods that restrict the action-value parameterization fail whereas Actor-Expert with a more general action-value parameterization succeeds. Finally, we demonstrate that Actor-Expert performs as well as or better than competitors on four benchmark continuous-action environments.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76563496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
arXiv: LearningPub Date : 2018-07-17DOI: 10.1007/978-3-030-10928-8_1
A. P. Appel, R. L. F. Cunha, C. Aggarwal, Marcela Megumi Terakado
{"title":"Using link and content over time for embedding generation in Dynamic Attributed Networks","authors":"A. P. Appel, R. L. F. Cunha, C. Aggarwal, Marcela Megumi Terakado","doi":"10.1007/978-3-030-10928-8_1","DOIUrl":"https://doi.org/10.1007/978-3-030-10928-8_1","url":null,"abstract":"","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78346588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
arXiv: LearningPub Date : 2018-07-10DOI: 10.20944/PREPRINTS201807.0185.V1
Veronica Morfi, D. Stowell
{"title":"Deep Learning on Low-Resource Datasets","authors":"Veronica Morfi, D. Stowell","doi":"10.20944/PREPRINTS201807.0185.V1","DOIUrl":"https://doi.org/10.20944/PREPRINTS201807.0185.V1","url":null,"abstract":"In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90946299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliable clustering of Bernoulli mixture models","authors":"Amir Najafi, A. Motahari, H. Rabiee","doi":"10.3150/19-bej1173","DOIUrl":"https://doi.org/10.3150/19-bej1173","url":null,"abstract":"A Bernoulli Mixture Model (BMM) is a finite mixture of random binary vectors with independent dimensions. The problem of clustering BMM data arises in a variety of real-world applications, ranging from population genetics to activity analysis in social networks. In this paper, we analyze the clusterability of BMMs from a theoretical perspective, when the number of clusters is unknown. In particular, we stipulate a set of conditions on the sample complexity and dimension of the model in order to guarantee the Probably Approximately Correct (PAC)-clusterability of a dataset. To the best of our knowledge, these findings are the first non-asymptotic bounds on the sample complexity of learning or clustering BMMs.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91290190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
arXiv: LearningPub Date : 2017-01-01DOI: 10.25911/5D723BC67A01E
Kamalaruban Parameswaran
{"title":"Transitions, Losses, and Re-parameterizations: Elements of Prediction Games.","authors":"Kamalaruban Parameswaran","doi":"10.25911/5D723BC67A01E","DOIUrl":"https://doi.org/10.25911/5D723BC67A01E","url":null,"abstract":"This thesis presents some geometric insights into three different types of two player prediction games -- namely general learning task, prediction with expert advice, and online convex optimization. These games differ in the nature of the opponent (stochastic, adversarial, or intermediate), the order of the players' move, and the utility function. The insights shed some light on the understanding of the intrinsic barriers of the prediction problems and the design of computationally efficient learning algorithms with strong theoretical guarantees (such as generalizability, statistical consistency, and constant regret etc.).","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76478638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
arXiv: LearningPub Date : 2016-11-16DOI: 10.3929/ethz-a-010890124
Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang
{"title":"The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning","authors":"Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang","doi":"10.3929/ethz-a-010890124","DOIUrl":"https://doi.org/10.3929/ethz-a-010890124","url":null,"abstract":"Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We present a framework called ZipML to answer these questions. For linear models, the answer is yes. We develop a simple framework based on one simple but novel strategy called double sampling. Our framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to 6.5x faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another 1.7x in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net. Finally, we extend our framework through approximation to non-linear models, such as SVM. We show that, although using low-precision data induces bias, we can appropriately bound and control the bias. We find in practice 8-bit precision is often sufficient to converge to the correct solution. Interestingly, however, in practice we notice that our framework does not always outperform the naive rounding approach. We discuss this negative result in detail.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89376724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning an Optimization Algorithm through Human Design Iterations","authors":"Thurston Sexton, Max Yi Ren","doi":"10.1115/1.4037344","DOIUrl":"https://doi.org/10.1115/1.4037344","url":null,"abstract":"Solving optimal design problems through crowdsourcing faces a dilemma: On one hand, human beings have been shown to be more effective than algorithms at searching for good solutions of certain real-world problems with high-dimensional or discrete solution spaces; on the other hand, the cost of setting up crowdsourcing environments, the uncertainty in the crowd's domain-specific competence, and the lack of commitment of the crowd, all contribute to the lack of real-world application of design crowdsourcing. We are thus motivated to investigate a solution-searching mechanism where an optimization algorithm is tuned based on human demonstrations on solution searching, so that the search can be continued after human participants abandon the problem. To do so, we model the iterative search process as a Bayesian Optimization (BO) algorithm, and propose an inverse BO (IBO) algorithm to find the maximum likelihood estimators of the BO parameters based on human solutions. We show through a vehicle design and control problem that the search performance of BO can be improved by recovering its parameters based on an effective human search. Thus, IBO has the potential to improve the success rate of design crowdsourcing activities, by requiring only good search strategies instead of good solutions from the crowd.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91048836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}