Empirical Software Engineering最新文献_第4页

Adopting automated bug assignment in practice — a longitudinal case study at Ericsson 在实践中采用自动错误分派--爱立信纵向案例研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-30 DOI: 10.1007/s10664-024-10507-y

Markus Borg, Leif Jonsson, Emelie Engström, Béla Bartalos, Attila Szabó

{"title":"Adopting automated bug assignment in practice — a longitudinal case study at Ericsson","authors":"Markus Borg, Leif Jonsson, Emelie Engström, Béla Bartalos, Attila Szabó","doi":"10.1007/s10664-024-10507-y","DOIUrl":"https://doi.org/10.1007/s10664-024-10507-y","url":null,"abstract":"<p>[Context] The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR’s first bug assignment without human intervention happened in April 2019. [Objective] Our study evaluates the adoption of TRR within its industrial context at Ericsson, i.e., we provide lessons learned related to the productization of a research prototype within a company. Moreover, we investigate 1) how TRR performs in the field, 2) what value TRR provides to Ericsson, and 3) how TRR has influenced the ways of working. [Method] We conduct a preregistered industrial case study combining interviews with TRR stakeholders, minutes from sprint planning meetings, and bug-tracking data. The data analysis includes thematic analysis, descriptive statistics, and Bayesian causal analysis. [Results] TRR is now an incorporated part of the bug assignment process. Considering the abstraction levels of the telecommunications stack, high-level modules are more positive while low-level modules experienced some drawbacks. Most importantly, some bug reports directly reach low-level modules without first having passed through fundamental root-cause analysis steps at higher levels. On average, TRR automatically assigns 30% of the incoming bug reports with an accuracy of 75%. Auto-routed TRs are resolved around 21% faster within Ericsson, and TRR has saved highly seasoned engineers many hours of work. Indirect effects of adopting TRR include process improvements, process awareness, increased communication, and higher job satisfaction. [Conclusions] TRR has saved time at Ericsson, but the adoption of automated bug assignment was more intricate compared to similar endeavors reported from other companies. We primarily attribute the difference to the very large size of the organization and the complex products. Key facilitators in the successful adoption include a gradual introduction, product champions, and careful stakeholder analysis.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"74 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141872637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Testing the past: can we still run tests in past snapshots for Java projects? 测试过去：我们还能在 Java 项目的过去快照中运行测试吗？

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-30 DOI: 10.1007/s10664-024-10530-z

Michel Maes-Bermejo, Micael Gallego, Francisco Gortázar, Gregorio Robles, Jesus M. Gonzalez-Barahona

{"title":"Testing the past: can we still run tests in past snapshots for Java projects?","authors":"Michel Maes-Bermejo, Micael Gallego, Francisco Gortázar, Gregorio Robles, Jesus M. Gonzalez-Barahona","doi":"10.1007/s10664-024-10530-z","DOIUrl":"https://doi.org/10.1007/s10664-024-10530-z","url":null,"abstract":"<p>Building past snapshots of a software project has shown to be of interest both for researchers and practitioners. However, little attention has been devoted specifically to tests available in those past snapshots, which are fundamental for the maintenance of old versions still in production. The aim of this study is to determine to which extent tests of past snapshots can be executed successfully, which would mean these past snapshots are still testable. Given a software project, we build all its past snapshots from source code, including tests, and then run the tests. When tests do not result in success, we also record the reasons, allowing us to determine factors that make tests fail. We run this method on a total of 86 Java projects. On average, for 52.53% of the project snapshots on which tests can be built, all tests pass. However, on average, 94.14% of tests pass in previous snapshots when we take into account the percentage of tests passing in the snapshots used for building those tests. In real software projects, successfully running tests in past snapshots is not something that we can take for granted: we have found that in a large proportion of the projects we studied this does not happen frequently. We have found that the building from source code is the main limitation when running tests on past snapshots. However, we have found some projects for which tests run successfully in a very large fraction of past snapshots, which allows us to identify good practices. We also provide a framework and metrics to quantify testability (the extent to which we are able to run tests of a snapshot with a success result) of past snapshots from several points of view, which simplifies new analyses on this matter, and could help to measure how any project performs in this respect.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"78 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141872638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the online recruitment and selection journey of novice software engineers: Anti-patterns and recommendations 调查软件工程师新手的在线招聘和选拔历程：反模式与建议

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-30 DOI: 10.1007/s10664-024-10498-w

Miguel Setúbal, Tayana Conte, Marcos Kalinowski, Allysson Allex Araújo

{"title":"Investigating the online recruitment and selection journey of novice software engineers: Anti-patterns and recommendations","authors":"Miguel Setúbal, Tayana Conte, Marcos Kalinowski, Allysson Allex Araújo","doi":"10.1007/s10664-024-10498-w","DOIUrl":"https://doi.org/10.1007/s10664-024-10498-w","url":null,"abstract":"<p>The growing software development market has increased the demand for qualified professionals in Software Engineering (SE). To this end, companies must enhance their Recruitment and Selection (R&S) processes to maintain high-quality teams, including opening opportunities for beginners, such as trainees and interns. However, given the various judgments and sociotechnical factors involved, this complex process of R&S poses a challenge for recent graduates seeking to enter the market. This paper aims to identify a set of anti-patterns and recommendations for early career SE professionals concerning R&S processes. Under an exploratory and qualitative methodological approach, we conducted six online Focus Groups with 18 recruiters with experience in R&S in the software industry. After completing our qualitative analysis, we identified 12 anti-patterns and 31 actionable recommendations regarding the hiring process focused on entry-level SE professionals. The identified anti-patterns encompass behavioral and technical dimensions innate to R&S processes. These findings provide a rich opportunity for reflection in the SE industry and offer valuable guidance for early-career candidates and organizations. From an academic perspective, this work also raises awareness of the intersection of Human Resources and SE, an area with considerable potential to be expanded in the context of cooperative and human aspects of SE.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"44 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141872639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

App review driven collaborative bug finding 应用程序审查驱动的协作式错误查找

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-26 DOI: 10.1007/s10664-024-10489-x

Xunzhu Tang, Haoye Tian, Pingfan Kong, Saad Ezzini, Kui Liu, Xin Xia, Jacques Klein, Tegawendé F. Bissyandé

{"title":"App review driven collaborative bug finding","authors":"Xunzhu Tang, Haoye Tian, Pingfan Kong, Saad Ezzini, Kui Liu, Xin Xia, Jacques Klein, Tegawendé F. Bissyandé","doi":"10.1007/s10664-024-10489-x","DOIUrl":"https://doi.org/10.1007/s10664-024-10489-x","url":null,"abstract":"<p>Software development teams generally welcome any effort to expose bugs in their code base. In this work, we build on the hypothesis that mobile apps from the same category (e.g., two web browser apps) may be affected by similar bugs in their evolution process. It is therefore possible to transfer the experience of one historical app to quickly find bugs in its new counterparts. This has been referred to as collaborative bug finding in the literature. Our novelty is that we guide the bug finding process by considering that existing bugs have been hinted within app reviews. Concretely, we design the <span>BugRMSys</span> approach to recommend bug reports for a target app by matching historical bug reports from apps in the same category with user app reviews of the target app. We experimentally show that this approach enables us to quickly expose and report dozens of bugs for targeted apps such as Brave (web browser app). <span>BugRMSys</span> ’s implementation relies on DistilBERT to produce natural language text embeddings. Our pipeline considers similarities between bug reports and app reviews to identify relevant bugs. We then focus on the app review as well as potential reproduction steps in the historical bug report (from a same-category app) to reproduce the bugs. Overall, after applying <span>BugRMSys</span> to six popular apps, we were able to identify, reproduce and report 20 new bugs: among these, 9 reports have been already triaged, 6 were confirmed, and 4 have been fixed by official development teams.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"245 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neuron importance-aware coverage analysis for deep neural network testing 用于深度神经网络测试的神经元重要性感知覆盖率分析

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-25 DOI: 10.1007/s10664-024-10524-x

Hongjing Guo, Chuanqi Tao, Zhiqiu Huang

{"title":"Neuron importance-aware coverage analysis for deep neural network testing","authors":"Hongjing Guo, Chuanqi Tao, Zhiqiu Huang","doi":"10.1007/s10664-024-10524-x","DOIUrl":"https://doi.org/10.1007/s10664-024-10524-x","url":null,"abstract":"<p>Deep Neural Network (DNN) models are widely used in many cutting-edge domains, such as medical diagnostics and autonomous driving. However, an urgent need to test DNN models thoroughly has increasingly risen. Recent research proposes various structural and non-structural coverage criteria to measure test adequacy. Structural coverage criteria quantify the degree to which the internal elements of DNN models are covered by a test suite. However, they convey little information about individual inputs and exhibit limited correlation with defect detection. Additionally, existing non-structural coverage criteria are unaware of neurons’ importance to decision-making. This paper addresses these limitations by proposing novel non-structural coverage criteria. By tracing neurons’ cumulative contribution to the final decision on the training set, this paper identifies important neurons of DNN models. A novel metric is proposed to quantify the difference in important neuron behavior between a test input and the training set, which provides a measured way at individual test input granularity. Additionally, two non-structural coverage criteria are introduced that allow for the quantification of test adequacy by examining differences in important neuron behavior between the testing and the training set. The empirical evaluation of image datasets demonstrates that the proposed metric outperforms the existing non-structural adequacy metrics by up to 14.7% accuracy improvement in capturing error-revealing test inputs. Compared with state-of-the-art coverage criteria, the proposed coverage criteria are more sensitive to errors, including natural errors and adversarial examples.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"90 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The role of psychological safety in promoting software quality in agile teams 心理安全对提高敏捷团队软件质量的作用

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-25 DOI: 10.1007/s10664-024-10512-1

Adam Alami, Mansooreh Zahedi, Oliver Krancher

{"title":"The role of psychological safety in promoting software quality in agile teams","authors":"Adam Alami, Mansooreh Zahedi, Oliver Krancher","doi":"10.1007/s10664-024-10512-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10512-1","url":null,"abstract":"<p>Psychological safety continues to pique the interest of scholars in a variety of disciplines of study. Recent research indicates that psychological safety fosters knowledge sharing and norm clarity and complements agile values. Although software quality remains a concern in the software industry, academics have yet to investigate whether and how psychologically safe teams provide superior results. In this study, we explore how psychological safety influences agile teams’ quality-related behaviors aimed at enhancing software quality. To widen the empirical coverage and evaluate the results, we chose a two-phase mixed-methods research design with an exploratory qualitative phase (20 interviews) followed by a quantitative phase (survey study, N = 423). Our findings show that, when psychological safety is established in agile software teams, it induces enablers of a social nature that advance the teams’ ability to pursue software quality. For example, admitting mistakes and taking initiatives equally help teams learn and invest their learning in their future decisions related to software quality. Past mistakes become points of reference for avoiding them in the future. Individuals become more willing to take initiatives aimed at enhancing quality practices and mitigating software quality issues. We contribute to our endeavor to understand the circumstances that promote software quality. Psychological safety requires organizations, their management, agile teams, and individuals to maintain and propagate safety principles. Our results also suggest that technological tools and procedures can be utilized alongside social strategies to promote software quality.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"12 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical study on the potential of word embedding techniques in bug report management tasks 单词嵌入技术在错误报告管理任务中的潜力实证研究

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-25 DOI: 10.1007/s10664-024-10510-3

Bingting Chen, Weiqin Zou, Biyu Cai, Qianshuang Meng, Wenjie Liu, Piji Li, Lin Chen

{"title":"An empirical study on the potential of word embedding techniques in bug report management tasks","authors":"Bingting Chen, Weiqin Zou, Biyu Cai, Qianshuang Meng, Wenjie Liu, Piji Li, Lin Chen","doi":"10.1007/s10664-024-10510-3","DOIUrl":"https://doi.org/10.1007/s10664-024-10510-3","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Representing the textual semantics of bug reports is a key component of bug report management (BRM) techniques. Existing studies mainly use classical information retrieval-based (IR-based) approaches, such as the vector space model (VSM) to do semantic extraction. Little attention is paid to exploring whether word embedding (WE) models from the natural language process could help BRM tasks.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>To have a general view of the potential of word embedding models in representing the semantics of bug reports and attempt to provide some actionable guidelines in using semantic retrieval models for BRM tasks.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We studied the efficacy of five widely recognized WE models for six BRM tasks on 20 widely-used products from the Eclipse and Mozilla foundations. Specifically, we first explored the suitable machine learning techniques under the use of WE models and the suitable WE model for BRM tasks. Then we studied whether WE models performed better than classical VSM. Last, we investigated whether WE models fine-tuned with bug reports outperformed general pre-trained WE models.</p><h3 data-test=\"abstract-sub-heading\">Key Results</h3><p>The Random Forest (RF) classifier outperformed other typical classifiers under the use of different WE models in semantic extraction.We rarely observed statistically significant performance differences among five WE models in five BRM classification tasks, but we found that small-dimensional WE models performed better than larger ones in the duplicate bug report detection task. Among three BRM tasks (i.e., bug severity prediction, reopened bug prediction, and duplicate bug report detection) that showed statistically significant performance differences, VSM outperformed the studied WE models. We did not find performance improvement after we fine-tuned general pre-trained BERT with bug report data.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>Performance improvements of using pre-trained WE models were not observed in studied BRM tasks. The combination of RF and traditional VSM was found to achieve the best performance in various BRM tasks.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"55 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The impact of concept drift and data leakage on log level prediction models 概念漂移和数据泄露对日志级别预测模型的影响

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-25 DOI: 10.1007/s10664-024-10518-9

Youssef Esseddiq Ouatiti, Mohammed Sayagh, Noureddine Kerzazi, Bram Adams, Ahmed E. Hassan

{"title":"The impact of concept drift and data leakage on log level prediction models","authors":"Youssef Esseddiq Ouatiti, Mohammed Sayagh, Noureddine Kerzazi, Bram Adams, Ahmed E. Hassan","doi":"10.1007/s10664-024-10518-9","DOIUrl":"https://doi.org/10.1007/s10664-024-10518-9","url":null,"abstract":"<p>Developers insert logging statements to collect information about the execution of their systems. Along with a logging framework (e.g., Log4j), practitioners can decide which log statement to print or suppress by tagging each log line with a log level. Since picking the right log level for a new logging statement is not straightforward, machine learning models for log level prediction (LLP) were proposed by prior studies. While these models show good performances, they are still subject to the context in which they are applied, specifically to the way practitioners decide on log levels in different phases of the development history of their projects (e.g., debugging vs. testing). For example, Openstack developers interchangeably increased/decreased the verbosity of their logs across the history of the project in response to code changes (e.g., before vs after fixing a new bug). Thus, the manifestation of these changing log verbosity choices across time can lead to concept drift and data leakage issues, which we wish to quantify in this paper on LLP models. In this paper, we empirically quantify the impact of data leakage and concept drift on the performance and interpretability of LLP models in three large open-source systems. Additionally, we compare the performance and interpretability of several time-aware approaches to tackle time-related issues. We observe that both shallow and deep-learning-based models suffer from both time-related issues. We also observe that training a model on just a window of the historical data (i.e., contextual model) outperforms models that are trained on the whole historical data (i.e., all-knowing model) in the case of our shallow LLP model. Finally, we observe that contextual models exhibit a different (even contradictory) model interpretability, with a (very) weak correlation between the ranking of important features of the pairs of contextual models we compared. Our findings suggest that data leakage and concept drift should be taken into consideration for LLP models. We also invite practitioners to include the size of the historical window as an additional hyperparameter to tune a suitable contextual model instead of leveraging all-knowing models.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"16 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Trusted Smart Contracts: A Comprehensive Test Suite For Vulnerability Detection 实现可信的智能合约：漏洞检测综合测试套件

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-25 DOI: 10.1007/s10664-024-10509-w

Andrei Arusoaie, Ștefan-Claudiu Susan

{"title":"Towards Trusted Smart Contracts: A Comprehensive Test Suite For Vulnerability Detection","authors":"Andrei Arusoaie, Ștefan-Claudiu Susan","doi":"10.1007/s10664-024-10509-w","DOIUrl":"https://doi.org/10.1007/s10664-024-10509-w","url":null,"abstract":"<p>The term <i>smart contract</i> was originally used to describe automated legal contracts. Nowadays, it refers to special programs that run on blockchain platforms and are popular in decentralized applications. In recent years, vulnerabilities in smart contracts caused significant financial losses. Researchers have proposed methods and tools for detecting them and have demonstrated their effectiveness using various test suites. In this paper, we aim to improve the current approach to measuring the effectiveness of vulnerability detectors in smart contracts. First, we identify several traits of existing test suites used to assess tool effectiveness. We explain how these traits limit the evaluation and comparison of vulnerability detection tools. Next, we propose a new test suite that prioritizes diversity over quantity, utilizing a comprehensive taxonomy to achieve this. Our organized test suite enables insightful evaluations and more precise comparisons among vulnerability detection tools. We demonstrate the benefits of our test suite by comparing several vulnerability detection tools using two sets of metrics. Results show that the tools we included in our comparison cover less than half of the vulnerabilities in the new test suite. Finally, based on our results, we answer several questions that we pose in the introduction of the paper about the effectiveness of the compared tools.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"17 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic title completion for Stack Overflow posts and GitHub issues 自动补全 Stack Overflow 帖子和 GitHub 问题的标题

IF 4.1 2区计算机科学

Empirical Software Engineering Pub Date : 2024-07-25 DOI: 10.1007/s10664-024-10513-0

Xiang Chen, Wenlong Pei, Shaoyu Yang, Yanlin Zhou, Zichen Zhang, Jiahua Pei

{"title":"Automatic title completion for Stack Overflow posts and GitHub issues","authors":"Xiang Chen, Wenlong Pei, Shaoyu Yang, Yanlin Zhou, Zichen Zhang, Jiahua Pei","doi":"10.1007/s10664-024-10513-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10513-0","url":null,"abstract":"<p>Title quality is important for different software engineering communities. For example, in Stack Overflow, posts with low-quality question titles often discourage potential answerers. In GitHub, issues with low-quality titles can make it difficult for developers to grasp the core idea of the problem. In previous studies, researchers mainly focused on generating titles from scratch by analyzing the body contents, such as the post body for Stack Overflow question title generation (SOTG) and the issue body for issue title generation (ISTG). However, the quality of the generated titles is still limited by the information available in the body contents. A more effective way is to provide accurate completion suggestions when developers compose titles. Inspired by this idea, we are the first to study the problem of automatic title completion for software engineering title generation tasks and propose the approach <span>TC4SETG</span>. Specifically, we first preprocess the gathered titles to form incomplete titles (i.e., tip information provided by developers) for simulating the title completion scene. Then we construct the input by concatenating the incomplete title with the body’s content. Finally, we fine-tune the pre-trained model CodeT5 to learn the title completion patterns effectively. To evaluate the effectiveness of <span>TC4SETG</span>, we selected 189,655 high-quality posts from Stack Overflow by covering eight popular programming languages for the SOTG task and 333,563 issues in the top-200 starred repositories on GitHub for the ISTG task. Our empirical results show that compared with the approaches of generating question titles from scratch, our proposed approach <span>TC4SETG</span> is more practical in automatic and human evaluation. Our experimental results demonstrate that <span>TC4SETG</span> outperforms corresponding state-of-the-art baselines in the SOTG task by a minimum of 25.82% and in the ISTG task by at least 45.48% in terms of ROUGE-L. Therefore, our study provides a new direction for studying automatic software engineering title generation and calls for more researchers to investigate this direction in the future.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"23 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141776684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0