Leevi Rantala , Lwin Khin Shar , Mika V. Mäntylä , Wei Minn , Yan Naing Tun
{"title":"Studying SATD in drone systems with Human-AI collaboration","authors":"Leevi Rantala , Lwin Khin Shar , Mika V. Mäntylä , Wei Minn , Yan Naing Tun","doi":"10.1016/j.jss.2025.112625","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Self-Admitted Technical Debt (SATD) refers to sub-optimal solutions that developers acknowledge within the source code. SATD research originated on Java projects but is expanding to other domains. We focus on SATD in drones, which are used for various critical tasks.</div></div><div><h3>Aims:</h3><div>The primary objective is to investigate SATD in drone systems. The second aim is to explore the integration of AI and human collaboration for SATD labelling and classification.</div></div><div><h3>Method:</h3><div>Method: We conducted a sample study of SATD comments in drone systems (14 open source, 4 SDKs) to analyse the quantity and types of SATD comments present. Our study incorporates collaboration between AI and humans by utilising LLM for SATD classification. Additionally, we classified a sample of 385 SATD comments as either drone-specific or non-drone-specific.</div></div><div><h3>Results:</h3><div>The most prevalent SATD categories in drone software are Code Debt (35%), Unclassifiable Debt (16%), and Design Debt (15%). We found that 22% of SATD is specific to drones. Drone-specific SATD is proportionally more focused on Requirements and Design Debt compared to non-drone-specific SATD. We found that using both human and LLM for SATD classification can improve accuracy, as both LLM and human revised their initial ratings. After two rounds, a “near-perfect agreement” (Fleiss’ kappa 0.83) was achieved.</div></div><div><h3>Conclusions:</h3><div>Future studies should investigate whether our observation that domain-specific (drone) SATD comments relate more to Requirement Debt holds true in other domains. We propose a workflow that integrates AI into classification tasks, enhancing the accuracy of both human and AI classifications.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"231 ","pages":"Article 112625"},"PeriodicalIF":4.1000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225002948","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Background:
Self-Admitted Technical Debt (SATD) refers to sub-optimal solutions that developers acknowledge within the source code. SATD research originated on Java projects but is expanding to other domains. We focus on SATD in drones, which are used for various critical tasks.
Aims:
The primary objective is to investigate SATD in drone systems. The second aim is to explore the integration of AI and human collaboration for SATD labelling and classification.
Method:
Method: We conducted a sample study of SATD comments in drone systems (14 open source, 4 SDKs) to analyse the quantity and types of SATD comments present. Our study incorporates collaboration between AI and humans by utilising LLM for SATD classification. Additionally, we classified a sample of 385 SATD comments as either drone-specific or non-drone-specific.
Results:
The most prevalent SATD categories in drone software are Code Debt (35%), Unclassifiable Debt (16%), and Design Debt (15%). We found that 22% of SATD is specific to drones. Drone-specific SATD is proportionally more focused on Requirements and Design Debt compared to non-drone-specific SATD. We found that using both human and LLM for SATD classification can improve accuracy, as both LLM and human revised their initial ratings. After two rounds, a “near-perfect agreement” (Fleiss’ kappa 0.83) was achieved.
Conclusions:
Future studies should investigate whether our observation that domain-specific (drone) SATD comments relate more to Requirement Debt holds true in other domains. We propose a workflow that integrates AI into classification tasks, enhancing the accuracy of both human and AI classifications.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.