arXiv: Learning最新文献

筛选
英文 中文
Compositional Transfer in Hierarchical Reinforcement Learning 分层强化学习中的组合迁移
arXiv: Learning Pub Date : 2019-06-26 DOI: 10.15607/rss.2020.xvi.054
Markus Wulfmeier, A. Abdolmaleki, Roland Hafner, J. T. Springenberg, Michael Neunert, Tim Hertweck, T. Lampe, Noah Siegel, N. Heess, Martin A. Riedmiller
{"title":"Compositional Transfer in Hierarchical Reinforcement Learning","authors":"Markus Wulfmeier, A. Abdolmaleki, Roland Hafner, J. T. Springenberg, Michael Neunert, Tim Hertweck, T. Lampe, Noah Siegel, N. Heess, Martin A. Riedmiller","doi":"10.15607/rss.2020.xvi.054","DOIUrl":"https://doi.org/10.15607/rss.2020.xvi.054","url":null,"abstract":"The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multiple levels and corresponding mechanisms for sharing off-policy transition data across low-level controllers and tasks as well as scheduling of tasks. The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case. We show that the investigated types of hierarchy enable positive transfer while partially mitigating negative interference and evaluate the benefits of additional incentives for efficient, compositional task solutions in single task domains. Finally, we demonstrate substantial data-efficiency and final performance gains over competitive baselines in a week-long, physical robot stacking experiment.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"132 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85755598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Developing an ANFIS-PSO Based Model to Estimate Mercury Emission in Combustion Flue Gases 基于anfiss - pso模型估算燃烧烟气中汞排放
arXiv: Learning Pub Date : 2019-05-10 DOI: 10.20944/PREPRINTS201905.0124.V1
S. Shamshirband, A. Baghban, Masoud Hadipoor, A. Mosavi
{"title":"Developing an ANFIS-PSO Based Model to Estimate Mercury Emission in Combustion Flue Gases","authors":"S. Shamshirband, A. Baghban, Masoud Hadipoor, A. Mosavi","doi":"10.20944/PREPRINTS201905.0124.V1","DOIUrl":"https://doi.org/10.20944/PREPRINTS201905.0124.V1","url":null,"abstract":"Accurate prediction of mercury content emitted from fossil-fueled power stations is of utmost important to environmental pollution assessment and hazard mitigation. In this paper, mercury content in the output gas from boilers was predicted using an Adaptive Neuro-Fuzzy Inference System (ANFIS) integrated with particle swarm optimization (PSO). Input parameters were selected from coal characteristics and the operational configuration of boilers. The proposed ANFIS-PSO model is capable of developing a nonlinear model to represent the dependency of flue gas mercury content into the specifications of coal and also the boiler type. In this study, operational information from 82 power plants has been gathered and employed to educate and examine the proposed model. To evaluate the performance of the proposed model the statistical meter of MARE% was implemented, which resulted 0.003266 and 0.013272 for training and testing respectively. Furthermore, relative errors between acquired data and predicted values were between -0.25% and 0.1%, which confirm the accuracy of PSO-ANFIS model.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91260683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces 参与者-专家:在连续动作空间中使用q -学习的框架
arXiv: Learning Pub Date : 2018-10-22 DOI: 10.7939/R3-QGDP-3872
Sungsu Lim
{"title":"Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces","authors":"Sungsu Lim","doi":"10.7939/R3-QGDP-3872","DOIUrl":"https://doi.org/10.7939/R3-QGDP-3872","url":null,"abstract":"Q-learning can be difficult to use in continuous action spaces, because an optimization has to be solved to find the maximal action for the action-values. A common strategy has been to restrict the functional form of the action-values to be concave in the actions, to simplify the optimization. Such restrictions, however, can prevent learning accurate action-values. In this work, we propose a new policy search objective that facilitates using Q-learning and a framework to optimize this objective, called Actor-Expert. The Expert uses Q-learning to update the action-values towards optimal action-values. The Actor learns the maximal actions over time for these changing action-values. We develop a Cross Entropy Method (CEM) for the Actor, where such a global optimization approach facilitates use of generically parameterized action-values. This method - which we call Conditional CEM - iteratively concentrates density around maximal actions, conditioned on state. We prove that this algorithm tracks the expected CEM update, over states with changing action-values. We demonstrate in a toy environment that previous methods that restrict the action-value parameterization fail whereas Actor-Expert with a more general action-value parameterization succeeds. Finally, we demonstrate that Actor-Expert performs as well as or better than competitors on four benchmark continuous-action environments.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76563496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Using link and content over time for embedding generation in Dynamic Attributed Networks 在动态属性网络中,使用随时间变化的链接和内容进行嵌入生成
arXiv: Learning Pub Date : 2018-07-17 DOI: 10.1007/978-3-030-10928-8_1
A. P. Appel, R. L. F. Cunha, C. Aggarwal, Marcela Megumi Terakado
{"title":"Using link and content over time for embedding generation in Dynamic Attributed Networks","authors":"A. P. Appel, R. L. F. Cunha, C. Aggarwal, Marcela Megumi Terakado","doi":"10.1007/978-3-030-10928-8_1","DOIUrl":"https://doi.org/10.1007/978-3-030-10928-8_1","url":null,"abstract":"","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78346588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Deep Learning on Low-Resource Datasets 低资源数据集上的深度学习
arXiv: Learning Pub Date : 2018-07-10 DOI: 10.20944/PREPRINTS201807.0185.V1
Veronica Morfi, D. Stowell
{"title":"Deep Learning on Low-Resource Datasets","authors":"Veronica Morfi, D. Stowell","doi":"10.20944/PREPRINTS201807.0185.V1","DOIUrl":"https://doi.org/10.20944/PREPRINTS201807.0185.V1","url":null,"abstract":"In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90946299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view Ensemble Classification for Clinically Actionable Genetic Mutations 临床可操作基因突变的多视图集成分类
arXiv: Learning Pub Date : 2018-06-26 DOI: 10.1007/978-3-319-94042-7_5
Xi Sheryl Zhang, Dandi Chen, Yongjun Zhu, Chao Che, Chang Su, Sendong Zhao, X. Min, Fei Wang
{"title":"Multi-view Ensemble Classification for Clinically Actionable Genetic Mutations","authors":"Xi Sheryl Zhang, Dandi Chen, Yongjun Zhu, Chao Che, Chang Su, Sendong Zhao, X. Min, Fei Wang","doi":"10.1007/978-3-319-94042-7_5","DOIUrl":"https://doi.org/10.1007/978-3-319-94042-7_5","url":null,"abstract":"","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"36 1","pages":"79-99"},"PeriodicalIF":0.0,"publicationDate":"2018-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90385013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reliable clustering of Bernoulli mixture models 伯努利混合模型的可靠聚类
arXiv: Learning Pub Date : 2017-10-05 DOI: 10.3150/19-bej1173
Amir Najafi, A. Motahari, H. Rabiee
{"title":"Reliable clustering of Bernoulli mixture models","authors":"Amir Najafi, A. Motahari, H. Rabiee","doi":"10.3150/19-bej1173","DOIUrl":"https://doi.org/10.3150/19-bej1173","url":null,"abstract":"A Bernoulli Mixture Model (BMM) is a finite mixture of random binary vectors with independent dimensions. The problem of clustering BMM data arises in a variety of real-world applications, ranging from population genetics to activity analysis in social networks. In this paper, we analyze the clusterability of BMMs from a theoretical perspective, when the number of clusters is unknown. In particular, we stipulate a set of conditions on the sample complexity and dimension of the model in order to guarantee the Probably Approximately Correct (PAC)-clusterability of a dataset. To the best of our knowledge, these findings are the first non-asymptotic bounds on the sample complexity of learning or clustering BMMs.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91290190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Transitions, Losses, and Re-parameterizations: Elements of Prediction Games. 过渡、损失和重新参数化:预测游戏的元素
arXiv: Learning Pub Date : 2017-01-01 DOI: 10.25911/5D723BC67A01E
Kamalaruban Parameswaran
{"title":"Transitions, Losses, and Re-parameterizations: Elements of Prediction Games.","authors":"Kamalaruban Parameswaran","doi":"10.25911/5D723BC67A01E","DOIUrl":"https://doi.org/10.25911/5D723BC67A01E","url":null,"abstract":"This thesis presents some geometric insights into three different types of two player prediction games -- namely general learning task, prediction with expert advice, and online convex optimization. These games differ in the nature of the opponent (stochastic, adversarial, or intermediate), the order of the players' move, and the utility function. The insights shed some light on the understanding of the intrinsic barriers of the prediction problems and the design of computationally efficient learning algorithms with strong theoretical guarantees (such as generalizability, statistical consistency, and constant regret etc.).","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76478638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning 用于端到端低精度模型训练的ZipML框架:可以、不可以和一点深度学习
arXiv: Learning Pub Date : 2016-11-16 DOI: 10.3929/ethz-a-010890124
Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang
{"title":"The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning","authors":"Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang","doi":"10.3929/ethz-a-010890124","DOIUrl":"https://doi.org/10.3929/ethz-a-010890124","url":null,"abstract":"Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We present a framework called ZipML to answer these questions. For linear models, the answer is yes. We develop a simple framework based on one simple but novel strategy called double sampling. Our framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias. We validate our framework across a range of applications, and show that it enables an FPGA prototype that is up to 6.5x faster than an implementation using full 32-bit precision. We further develop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another 1.7x in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the-art XNOR-Net. Finally, we extend our framework through approximation to non-linear models, such as SVM. We show that, although using low-precision data induces bias, we can appropriately bound and control the bias. We find in practice 8-bit precision is often sufficient to converge to the correct solution. Interestingly, however, in practice we notice that our framework does not always outperform the naive rounding approach. We discuss this negative result in detail.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89376724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Learning an Optimization Algorithm through Human Design Iterations 通过人类设计迭代学习优化算法
arXiv: Learning Pub Date : 2016-08-24 DOI: 10.1115/1.4037344
Thurston Sexton, Max Yi Ren
{"title":"Learning an Optimization Algorithm through Human Design Iterations","authors":"Thurston Sexton, Max Yi Ren","doi":"10.1115/1.4037344","DOIUrl":"https://doi.org/10.1115/1.4037344","url":null,"abstract":"Solving optimal design problems through crowdsourcing faces a dilemma: On one hand, human beings have been shown to be more effective than algorithms at searching for good solutions of certain real-world problems with high-dimensional or discrete solution spaces; on the other hand, the cost of setting up crowdsourcing environments, the uncertainty in the crowd's domain-specific competence, and the lack of commitment of the crowd, all contribute to the lack of real-world application of design crowdsourcing. We are thus motivated to investigate a solution-searching mechanism where an optimization algorithm is tuned based on human demonstrations on solution searching, so that the search can be continued after human participants abandon the problem. To do so, we model the iterative search process as a Bayesian Optimization (BO) algorithm, and propose an inverse BO (IBO) algorithm to find the maximum likelihood estimators of the BO parameters based on human solutions. We show through a vehicle design and control problem that the search performance of BO can be improved by recovering its parameters based on an effective human search. Thus, IBO has the potential to improve the success rate of design crowdsourcing activities, by requiring only good search strategies instead of good solutions from the crowd.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91048836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信