Ali Shehper, Anibal M. Medina-Mardones, Bartłomiej Lewandowski, Angus Gruen, Piotr Kucharski, Sergei Gukov
{"title":"强化学习难以解决数学问题的原因:案例研究","authors":"Ali Shehper, Anibal M. Medina-Mardones, Bartłomiej Lewandowski, Angus Gruen, Piotr Kucharski, Sergei Gukov","doi":"arxiv-2408.15332","DOIUrl":null,"url":null,"abstract":"Using a long-standing conjecture from combinatorial group theory, we explore,\nfrom multiple angles, the challenges of finding rare instances carrying\ndisproportionately high rewards. Based on lessons learned in the mathematical\ncontext defined by the Andrews-Curtis conjecture, we propose algorithmic\nimprovements that can be relevant in other domains with ultra-sparse reward\nproblems. Although our case study can be formulated as a game, its shortest\nwinning sequences are potentially $10^6$ or $10^9$ times longer than those\nencountered in chess. In the process of our study, we demonstrate that one of\nthe potential counterexamples due to Akbulut and Kirby, whose status escaped\ndirect mathematical methods for 39 years, is stably AC-trivial.","PeriodicalId":501037,"journal":{"name":"arXiv - MATH - Group Theory","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"What makes math problems hard for reinforcement learning: a case study\",\"authors\":\"Ali Shehper, Anibal M. Medina-Mardones, Bartłomiej Lewandowski, Angus Gruen, Piotr Kucharski, Sergei Gukov\",\"doi\":\"arxiv-2408.15332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Using a long-standing conjecture from combinatorial group theory, we explore,\\nfrom multiple angles, the challenges of finding rare instances carrying\\ndisproportionately high rewards. Based on lessons learned in the mathematical\\ncontext defined by the Andrews-Curtis conjecture, we propose algorithmic\\nimprovements that can be relevant in other domains with ultra-sparse reward\\nproblems. Although our case study can be formulated as a game, its shortest\\nwinning sequences are potentially $10^6$ or $10^9$ times longer than those\\nencountered in chess. In the process of our study, we demonstrate that one of\\nthe potential counterexamples due to Akbulut and Kirby, whose status escaped\\ndirect mathematical methods for 39 years, is stably AC-trivial.\",\"PeriodicalId\":501037,\"journal\":{\"name\":\"arXiv - MATH - Group Theory\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Group Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.15332\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Group Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.15332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
What makes math problems hard for reinforcement learning: a case study
Using a long-standing conjecture from combinatorial group theory, we explore,
from multiple angles, the challenges of finding rare instances carrying
disproportionately high rewards. Based on lessons learned in the mathematical
context defined by the Andrews-Curtis conjecture, we propose algorithmic
improvements that can be relevant in other domains with ultra-sparse reward
problems. Although our case study can be formulated as a game, its shortest
winning sequences are potentially $10^6$ or $10^9$ times longer than those
encountered in chess. In the process of our study, we demonstrate that one of
the potential counterexamples due to Akbulut and Kirby, whose status escaped
direct mathematical methods for 39 years, is stably AC-trivial.