Fengqi Hu, Weihao Xue, Siyuan Zhou, Ye Wang, Bo Jiang, Qiao Huang, Hua Zhang
{"title":"Python’s evolution on Stack Overflow: An empirical analysis of topic trends","authors":"Fengqi Hu, Weihao Xue, Siyuan Zhou, Ye Wang, Bo Jiang, Qiao Huang, Hua Zhang","doi":"10.1016/j.cola.2025.101340","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of information technology and changing programming practices, the demand for programming discussions on online Q&A platforms is growing. This study analyzes over two million Python-related posts on Stack Overflow to identify core topics and challenges over fifteen years. By using a Gradient Boosting Decision Tree (GBDT) model to quantify post popularity, we objectively show what the hottest as well as the most disturbing topics related to Python are to users at different times. We find that: The domains most closely associated with Python are data processing and machine learning, while development environments as well as automation and testing are gradually increasing in popularity. Machine learning is the area that bothers users the most. Moreover, we found that some questions that confuse users can increase the popularity of related topics. These findings can help developers grasp the direction of the Python language so that they can better plan their personal learning and project development. Enterprises and organizations can also optimize resource allocation based on trends in hot topics for training, tool development, and technical support.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101340"},"PeriodicalIF":1.8000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118425000267","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of information technology and changing programming practices, the demand for programming discussions on online Q&A platforms is growing. This study analyzes over two million Python-related posts on Stack Overflow to identify core topics and challenges over fifteen years. By using a Gradient Boosting Decision Tree (GBDT) model to quantify post popularity, we objectively show what the hottest as well as the most disturbing topics related to Python are to users at different times. We find that: The domains most closely associated with Python are data processing and machine learning, while development environments as well as automation and testing are gradually increasing in popularity. Machine learning is the area that bothers users the most. Moreover, we found that some questions that confuse users can increase the popularity of related topics. These findings can help developers grasp the direction of the Python language so that they can better plan their personal learning and project development. Enterprises and organizations can also optimize resource allocation based on trends in hot topics for training, tool development, and technical support.