A Tale of Two Tasks: Automated Issue Priority Prediction with Deep Multi-task Learning

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement Pub Date : 2022-09-19 DOI:10.1145/3544902.3546257

Yingling Li, Xing Che, Yuekai Huang, Junjie Wang, Song Wang, Yawen Wang, Qing Wang

{"title":"A Tale of Two Tasks: Automated Issue Priority Prediction with Deep Multi-task Learning","authors":"Yingling Li, Xing Che, Yuekai Huang, Junjie Wang, Song Wang, Yawen Wang, Qing Wang","doi":"10.1145/3544902.3546257","DOIUrl":null,"url":null,"abstract":"Background. Issues are prevalent, and identifying the correct priority of the reported issues is crucial to reduce the maintenance effort and ensure higher software quality. There are several approaches for the automatic priority prediction, yet they do not fully utilize the related information that might influence the priority assignment. Our observation reveals that there are noticeable correlations between an issue’s priority and its category, e.g., an issue of bug category tends to be assigned with higher priority than an issue of document category. This correlation motivates us to employ multi-task learning to share the knowledge about issue’s category prediction and facilitating priority prediction. Aims. This paper aims at providing an automatic approach for effective issue’s priority prediction, to reduce the burden of the project members and better manage the issues. Method. We propose issue priority prediction approach PRIMA with deep multi-task learning, which takes the issue category prediction as another task to facilitate the information sharing and learning. It consists of three main phases: 1) data preparation and augmentation phase, which allows data sharing beyond single task learning; 2) model construction phase, which designs shared layers to encode the semantics of textual descriptions, and task-specific layers to model two tasks in parallel; it also includes the indicative attributes to better capture an issue’s inherent meaning; 3) model training phase, which enables eavesdropping by shared loss function between two tasks. Results. Evaluations with four large-scale open-source projects show that PRIMA outperforms commonly-used and state-of-the-art baselines, with 32% -55% higher precision, and 28% - 56% higher recall. Compared with single task learning, the performance improvement reaches 18% in precision and 19% in recall. Results from our user study further prove its potential practical value. Conclusions. The proposed approach provides a novel and effective way for issue priority prediction, and sheds light on jointly exploring other issue-management tasks.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544902.3546257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Background. Issues are prevalent, and identifying the correct priority of the reported issues is crucial to reduce the maintenance effort and ensure higher software quality. There are several approaches for the automatic priority prediction, yet they do not fully utilize the related information that might influence the priority assignment. Our observation reveals that there are noticeable correlations between an issue’s priority and its category, e.g., an issue of bug category tends to be assigned with higher priority than an issue of document category. This correlation motivates us to employ multi-task learning to share the knowledge about issue’s category prediction and facilitating priority prediction. Aims. This paper aims at providing an automatic approach for effective issue’s priority prediction, to reduce the burden of the project members and better manage the issues. Method. We propose issue priority prediction approach PRIMA with deep multi-task learning, which takes the issue category prediction as another task to facilitate the information sharing and learning. It consists of three main phases: 1) data preparation and augmentation phase, which allows data sharing beyond single task learning; 2) model construction phase, which designs shared layers to encode the semantics of textual descriptions, and task-specific layers to model two tasks in parallel; it also includes the indicative attributes to better capture an issue’s inherent meaning; 3) model training phase, which enables eavesdropping by shared loss function between two tasks. Results. Evaluations with four large-scale open-source projects show that PRIMA outperforms commonly-used and state-of-the-art baselines, with 32% -55% higher precision, and 28% - 56% higher recall. Compared with single task learning, the performance improvement reaches 18% in precision and 19% in recall. Results from our user study further prove its potential practical value. Conclusions. The proposed approach provides a novel and effective way for issue priority prediction, and sheds light on jointly exploring other issue-management tasks.

查看原文本刊更多论文

两个任务的故事:基于深度多任务学习的自动问题优先级预测

背景。问题是普遍存在的，确定报告问题的正确优先级对于减少维护工作和确保更高的软件质量至关重要。自动优先级预测有几种方法，但它们都没有充分利用可能影响优先级分配的相关信息。我们的观察显示，问题的优先级与其类别之间存在明显的相关性，例如，bug类别的问题往往比文档类别的问题具有更高的优先级。这种相关性促使我们采用多任务学习来共享问题的类别预测知识，促进优先级预测。目标本文旨在为有效的问题优先级预测提供一种自动化的方法，以减轻项目成员的负担，更好地管理问题。方法。我们提出了具有深度多任务学习的问题优先级预测方法PRIMA，该方法将问题类别预测作为另一项任务，便于信息共享和学习。它包括三个主要阶段:1)数据准备和增强阶段，允许数据共享超越单任务学习;2)模型构建阶段，设计共享层对文本描述的语义进行编码，设计任务特定层对两个任务并行建模;它还包括指示性属性，以更好地捕捉问题的内在含义;3)模型训练阶段，通过两个任务之间的共享损失函数实现窃听。结果。对四个大型开源项目的评估表明，PRIMA优于常用的和最先进的基线，精度提高32% -55%，召回率提高28% - 56%。与单任务学习相比，准确率提高了18%，召回率提高了19%。用户研究结果进一步证明了其潜在的实用价值。结论。该方法为问题优先级预测提供了一种新颖有效的方法，并为共同探索其他问题管理任务提供了思路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

自引率

0.00%

发文量