迈向智能系统的一般评估:从重现AIQ测试结果中吸取的教训

Journal of Artificial General Intelligence Pub Date : 2018-03-07 DOI:10.2478/jagi-2018-0001

Ondrej Vadinský

{"title":"迈向智能系统的一般评估:从重现AIQ测试结果中吸取的教训","authors":"Ondrej Vadinský","doi":"10.2478/jagi-2018-0001","DOIUrl":null,"url":null,"abstract":"Abstract This paper attempts to replicate the results of evaluating several artificial agents using the Algorithmic Intelligence Quotient test originally reported by Legg and Veness. Three experiments were conducted: One using default settings, one in which the action space was varied and one in which the observation space was varied. While the performance of freq, Q0, Qλ, and HLQλ corresponded well with the original results, the resulting values differed, when using MC-AIXI. Varying the observation space seems to have no qualitative impact on the results as reported, while (contrary to the original results) varying the action space seems to have some impact. An analysis of the impact of modifying parameters of MC-AIXI on its performance in the default settings was carried out with the help of data mining techniques used to identifying highly performing configurations. Overall, the Algorithmic Intelligence Quotient test seems to be reliable, however as a general artificial intelligence evaluation method it has several limits. The test is dependent on the chosen reference machine and also sensitive to changes to its settings. It brings out some differences among agents, however, since they are limited in size, the test setting may not yet be sufficiently complex. A demanding parameter sweep is needed to thoroughly evaluate configurable agents that, together with the test format, further highlights computational requirements of an agent. These and other issues are discussed in the paper along with proposals suggesting how to alleviate them. An implementation of some of the proposals is also demonstrated.","PeriodicalId":247142,"journal":{"name":"Journal of Artificial General Intelligence","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results\",\"authors\":\"Ondrej Vadinský\",\"doi\":\"10.2478/jagi-2018-0001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract This paper attempts to replicate the results of evaluating several artificial agents using the Algorithmic Intelligence Quotient test originally reported by Legg and Veness. Three experiments were conducted: One using default settings, one in which the action space was varied and one in which the observation space was varied. While the performance of freq, Q0, Qλ, and HLQλ corresponded well with the original results, the resulting values differed, when using MC-AIXI. Varying the observation space seems to have no qualitative impact on the results as reported, while (contrary to the original results) varying the action space seems to have some impact. An analysis of the impact of modifying parameters of MC-AIXI on its performance in the default settings was carried out with the help of data mining techniques used to identifying highly performing configurations. Overall, the Algorithmic Intelligence Quotient test seems to be reliable, however as a general artificial intelligence evaluation method it has several limits. The test is dependent on the chosen reference machine and also sensitive to changes to its settings. It brings out some differences among agents, however, since they are limited in size, the test setting may not yet be sufficiently complex. A demanding parameter sweep is needed to thoroughly evaluate configurable agents that, together with the test format, further highlights computational requirements of an agent. These and other issues are discussed in the paper along with proposals suggesting how to alleviate them. An implementation of some of the proposals is also demonstrated.\",\"PeriodicalId\":247142,\"journal\":{\"name\":\"Journal of Artificial General Intelligence\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial General Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jagi-2018-0001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial General Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jagi-2018-0001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

摘要本文试图复制Legg和Veness最初报道的算法智商测试(Algorithmic Intelligence Quotient test)评估几种人工智能体的结果。我们进行了三个实验:一个是使用默认设置，一个是改变动作空间，一个是改变观察空间。使用MC-AIXI时，freq、Q0、Qλ和HLQλ的性能与原始结果一致，但结果值不同。改变观察空间似乎对报告的结果没有定性影响，而(与原始结果相反)改变行动空间似乎有一些影响。利用数据挖掘技术，分析了MC-AIXI在默认设置下修改参数对其性能的影响。总的来说，算法智商测试似乎是可靠的，但是作为一种通用的人工智能评估方法，它有一些局限性。测试依赖于所选择的参考机器，并且对其设置的变化也很敏感。然而，由于它们的大小有限，测试设置可能还不够复杂，因此在代理之间存在一些差异。需要进行严格的参数扫描来彻底评估可配置代理，这些可配置代理与测试格式一起进一步突出了代理的计算需求。本文对这些问题和其他问题进行了讨论，并提出了如何缓解这些问题的建议。本文还演示了其中一些建议的实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

Abstract This paper attempts to replicate the results of evaluating several artificial agents using the Algorithmic Intelligence Quotient test originally reported by Legg and Veness. Three experiments were conducted: One using default settings, one in which the action space was varied and one in which the observation space was varied. While the performance of freq, Q0, Qλ, and HLQλ corresponded well with the original results, the resulting values differed, when using MC-AIXI. Varying the observation space seems to have no qualitative impact on the results as reported, while (contrary to the original results) varying the action space seems to have some impact. An analysis of the impact of modifying parameters of MC-AIXI on its performance in the default settings was carried out with the help of data mining techniques used to identifying highly performing configurations. Overall, the Algorithmic Intelligence Quotient test seems to be reliable, however as a general artificial intelligence evaluation method it has several limits. The test is dependent on the chosen reference machine and also sensitive to changes to its settings. It brings out some differences among agents, however, since they are limited in size, the test setting may not yet be sufficiently complex. A demanding parameter sweep is needed to thoroughly evaluate configurable agents that, together with the test format, further highlights computational requirements of an agent. These and other issues are discussed in the paper along with proposals suggesting how to alleviate them. An implementation of some of the proposals is also demonstrated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Artificial General Intelligence

自引率

0.00%

发文量