{"title":"Advancing research software engineering with AI: a research framework","authors":"Siamak Farshidi, Kwabena Ebo Bennin, Önder Babur, June Sallou, Ayalew Kassahun, Bedir Tekinerdogan","doi":"10.1007/s10515-026-00621-0","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Research software has become a central pillar of scientific discovery, yet its engineering quality, sustainability, and reproducibility vary widely across projects. At the same time, advances in artificial intelligence (AI), particularly generative AI (GenAI), are rapidly transforming how software is developed. While these tools promise productivity gains, their broader impact on research software engineering practices remains poorly understood at scale. In this study, we present a large-scale empirical analysis of AI-assisted research software engineering. We analyzed 1,510 open-source research software repositories retrieved from Zenodo using the IEEE Taxonomy 2025 top-level categories (598 query terms), restricted to records labeled Software and created after November 2022 (post-GenAI emergence), with duplicate and incomplete entries removed. To distinguish archival dissemination from active development, we separate Zenodo-only artifacts from records linked to evolving GitHub repositories and enrich the latter with repository-level development indicators. Our analysis integrates multiple dimensions, including software engineering maturity (e.g., documentation, automation, testing, and releases), FAIRness for research software (FAIR4RS metadata indicators), inferred AI and GenAI usage, and operational signals related to AIOps and MLOps practices. Based on these indicators, we propose and empirically ground a quadrant-based model that characterizes research software development modes along the axes of engineering maturity and AI integration. The results show that AI-assisted practices are increasingly present in research software, but their adoption remains uneven and often decoupled from established engineering disciplines. Repositories classified as AI4RSE exhibit longer active lifespans, stronger maintenance signals, and higher FAIR alignment than exploratory or informally developed projects. At the same time, a substantial fraction of Zenodo artifacts represent archival snapshots rather than evolving software, highlighting the importance of interpreting engineering indicators in light of dissemination intent. This work contributes (i) a large-scale empirical characterization based on 1,510 repositories of AI-assisted research software development, (ii) an integrated analytical framework combining software engineering, FAIRness, AI usage, and operational practices, and (iii) evidence-based insights into the opportunities and challenges of responsible and sustainable AI4RSE. Together, these contributions provide a foundation for future research and practical guidance on integrating AI into research software engineering.</p>\n </div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 3","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2026-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-026-00621-0.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-026-00621-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Research software has become a central pillar of scientific discovery, yet its engineering quality, sustainability, and reproducibility vary widely across projects. At the same time, advances in artificial intelligence (AI), particularly generative AI (GenAI), are rapidly transforming how software is developed. While these tools promise productivity gains, their broader impact on research software engineering practices remains poorly understood at scale. In this study, we present a large-scale empirical analysis of AI-assisted research software engineering. We analyzed 1,510 open-source research software repositories retrieved from Zenodo using the IEEE Taxonomy 2025 top-level categories (598 query terms), restricted to records labeled Software and created after November 2022 (post-GenAI emergence), with duplicate and incomplete entries removed. To distinguish archival dissemination from active development, we separate Zenodo-only artifacts from records linked to evolving GitHub repositories and enrich the latter with repository-level development indicators. Our analysis integrates multiple dimensions, including software engineering maturity (e.g., documentation, automation, testing, and releases), FAIRness for research software (FAIR4RS metadata indicators), inferred AI and GenAI usage, and operational signals related to AIOps and MLOps practices. Based on these indicators, we propose and empirically ground a quadrant-based model that characterizes research software development modes along the axes of engineering maturity and AI integration. The results show that AI-assisted practices are increasingly present in research software, but their adoption remains uneven and often decoupled from established engineering disciplines. Repositories classified as AI4RSE exhibit longer active lifespans, stronger maintenance signals, and higher FAIR alignment than exploratory or informally developed projects. At the same time, a substantial fraction of Zenodo artifacts represent archival snapshots rather than evolving software, highlighting the importance of interpreting engineering indicators in light of dissemination intent. This work contributes (i) a large-scale empirical characterization based on 1,510 repositories of AI-assisted research software development, (ii) an integrated analytical framework combining software engineering, FAIRness, AI usage, and operational practices, and (iii) evidence-based insights into the opportunities and challenges of responsible and sustainable AI4RSE. Together, these contributions provide a foundation for future research and practical guidance on integrating AI into research software engineering.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.