{"title":"llm和堆栈溢出讨论:可靠性、影响和挑战","authors":"Leuson Da Silva , Jordan Samhi , Foutse Khomh","doi":"10.1016/j.jss.2025.112541","DOIUrl":null,"url":null,"abstract":"<div><div>Since its release in November 2022, ChatGPT has shaken up Stack Overflow, the premier platform for developers’ queries on programming and software development. Demonstrating an ability to generate instant, human-like responses to technical questions, ChatGPT has ignited debates within the developer community about the evolving role of human-driven platforms in the age of generative AI. Two months after ChatGPT’s release, Meta released its answer with its own Large Language Model (LLM) called LLaMA: <em>the race was on</em>. We conducted an empirical study analyzing questions from Stack Overflow and using these LLMs to address them. This way, we aim to quantify the reliability of LLMs’ answers and their potential to replace Stack Overflow in the long term; identify and understand why LLMs fail; measure users’ activity evolution with Stack Overflow over time; and compare LLMs together. Our empirical results are unequivocal: <em>ChatGPT and LLaMA challenge human expertise, yet do not outperform it for some domains</em>, while a significant decline in user posting activity has been observed. Furthermore, we also discuss the impact of our findings regarding the usage and development of new LLMs and provide guidelines for future challenges faced by users and researchers.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112541"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLMs and Stack Overflow discussions: Reliability, impact, and challenges\",\"authors\":\"Leuson Da Silva , Jordan Samhi , Foutse Khomh\",\"doi\":\"10.1016/j.jss.2025.112541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Since its release in November 2022, ChatGPT has shaken up Stack Overflow, the premier platform for developers’ queries on programming and software development. Demonstrating an ability to generate instant, human-like responses to technical questions, ChatGPT has ignited debates within the developer community about the evolving role of human-driven platforms in the age of generative AI. Two months after ChatGPT’s release, Meta released its answer with its own Large Language Model (LLM) called LLaMA: <em>the race was on</em>. We conducted an empirical study analyzing questions from Stack Overflow and using these LLMs to address them. This way, we aim to quantify the reliability of LLMs’ answers and their potential to replace Stack Overflow in the long term; identify and understand why LLMs fail; measure users’ activity evolution with Stack Overflow over time; and compare LLMs together. Our empirical results are unequivocal: <em>ChatGPT and LLaMA challenge human expertise, yet do not outperform it for some domains</em>, while a significant decline in user posting activity has been observed. Furthermore, we also discuss the impact of our findings regarding the usage and development of new LLMs and provide guidelines for future challenges faced by users and researchers.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"230 \",\"pages\":\"Article 112541\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121225002109\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225002109","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
LLMs and Stack Overflow discussions: Reliability, impact, and challenges
Since its release in November 2022, ChatGPT has shaken up Stack Overflow, the premier platform for developers’ queries on programming and software development. Demonstrating an ability to generate instant, human-like responses to technical questions, ChatGPT has ignited debates within the developer community about the evolving role of human-driven platforms in the age of generative AI. Two months after ChatGPT’s release, Meta released its answer with its own Large Language Model (LLM) called LLaMA: the race was on. We conducted an empirical study analyzing questions from Stack Overflow and using these LLMs to address them. This way, we aim to quantify the reliability of LLMs’ answers and their potential to replace Stack Overflow in the long term; identify and understand why LLMs fail; measure users’ activity evolution with Stack Overflow over time; and compare LLMs together. Our empirical results are unequivocal: ChatGPT and LLaMA challenge human expertise, yet do not outperform it for some domains, while a significant decline in user posting activity has been observed. Furthermore, we also discuss the impact of our findings regarding the usage and development of new LLMs and provide guidelines for future challenges faced by users and researchers.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.