{"title":"Investigating the bugs in reinforcement learning programs: Insights from Stack Overflow and GitHub","authors":"Jiayin Song, Yike Li, Yunzhe Tian, Haoxuan Ma, Honglei Li, Jie Zuo, Jiqiang Liu, Wenjia Niu","doi":"10.1007/s10515-025-00555-z","DOIUrl":null,"url":null,"abstract":"<div><p>Reinforcement learning (RL) is increasingly applied in areas such as gaming, robotic control, and autonomous driving. Like to deep learning, RL systems also encounter failures during operation. However, RL differs from deep learning in terms of its error causes and symptom manifestations. What are the differences in error causes and symptoms between RL and deep learning? How are RL errors and their symptoms related? Understanding the symptoms and causes of RL failures can advance research on RL failure detection and repair. In this paper, we conducted a comprehensive empirical study by collecting 1,155 error reports from the popular Q&A forum <i>Stack Overflow</i> and four <i>GitHub</i> repositories: baselines, stable-baselines3, tianshou and keras-rl. We analyzed the root causes and symptoms of these failures and examined the differences in resolution times across various root causes. Additionally, we analyzed the correlations between causes and symptoms. Our study yielded 14 key findings, and six implications for developing RL detection and failure repair tools. Our work is the first to integrate LLM-based analysis with manual validation for RL bug studies, providing actionable insights for tool development and testing strategies.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00555-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement learning (RL) is increasingly applied in areas such as gaming, robotic control, and autonomous driving. Like to deep learning, RL systems also encounter failures during operation. However, RL differs from deep learning in terms of its error causes and symptom manifestations. What are the differences in error causes and symptoms between RL and deep learning? How are RL errors and their symptoms related? Understanding the symptoms and causes of RL failures can advance research on RL failure detection and repair. In this paper, we conducted a comprehensive empirical study by collecting 1,155 error reports from the popular Q&A forum Stack Overflow and four GitHub repositories: baselines, stable-baselines3, tianshou and keras-rl. We analyzed the root causes and symptoms of these failures and examined the differences in resolution times across various root causes. Additionally, we analyzed the correlations between causes and symptoms. Our study yielded 14 key findings, and six implications for developing RL detection and failure repair tools. Our work is the first to integrate LLM-based analysis with manual validation for RL bug studies, providing actionable insights for tool development and testing strategies.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.