{"title":"利用启发式语言模式自动识别性能问题报告的平台诊断框架","authors":"Yutong Zhao;Lu Xiao;Sunny Wong","doi":"10.1109/TSE.2024.3390623","DOIUrl":null,"url":null,"abstract":"Software performance is critical for system efficiency, with performance issues potentially resulting in budget overruns, project delays, and market losses. Such problems are reported to developers through issue tracking systems, which are often under-tagged, as the manual tagging process is voluntary and time-consuming. Existing automated performance issue tagging techniques, such as keyword matching and machine/deep learning models, struggle due to imbalanced datasets and a high degree of variance. This paper presents a novel hybrid classification approach, combining Heuristic Linguistic Patterns (\n<italic>HLP</i>\ns) with machine/deep learning models to enable practitioners to automatically identify performance-related issues. The proposed approach works across three progressive levels: \n<italic>HLP</i>\n tagging, sentence tagging, and issue tagging, with a focus on linguistic analysis of issue descriptions. The authors evaluate the approach on three different datasets collected from different projects and issue-tracking platforms to prove that the proposed framework is accurate, project- and platform-agnostic, and robust to imbalanced datasets. Furthermore, this study also examined how the two unique techniques of the framework, including the fuzzy \n<italic>HLP</i>\n matching and the \n<italic>Issue HLP Matrix</i>\n, contribute to the accuracy. Finally, the study explored the effectiveness and impact of two off-the-shelf feature selection techniques, \n<italic>Boruta</i>\n and \n<italic>RFE</i>\n, with the proposed framework. The results showed that the proposed framework has great potential for practitioners to accurately (with up to 100% precision, 66% recall, and 79% \n<italic>F1</i>\n-score) identify performance issues, with robustness to imbalanced data and good transferability to new projects and issue tracking platforms.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":null,"pages":null},"PeriodicalIF":6.5000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10504708","citationCount":"0","resultStr":"{\"title\":\"A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports With Heuristic Linguistic Patterns\",\"authors\":\"Yutong Zhao;Lu Xiao;Sunny Wong\",\"doi\":\"10.1109/TSE.2024.3390623\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software performance is critical for system efficiency, with performance issues potentially resulting in budget overruns, project delays, and market losses. Such problems are reported to developers through issue tracking systems, which are often under-tagged, as the manual tagging process is voluntary and time-consuming. Existing automated performance issue tagging techniques, such as keyword matching and machine/deep learning models, struggle due to imbalanced datasets and a high degree of variance. This paper presents a novel hybrid classification approach, combining Heuristic Linguistic Patterns (\\n<italic>HLP</i>\\ns) with machine/deep learning models to enable practitioners to automatically identify performance-related issues. The proposed approach works across three progressive levels: \\n<italic>HLP</i>\\n tagging, sentence tagging, and issue tagging, with a focus on linguistic analysis of issue descriptions. The authors evaluate the approach on three different datasets collected from different projects and issue-tracking platforms to prove that the proposed framework is accurate, project- and platform-agnostic, and robust to imbalanced datasets. Furthermore, this study also examined how the two unique techniques of the framework, including the fuzzy \\n<italic>HLP</i>\\n matching and the \\n<italic>Issue HLP Matrix</i>\\n, contribute to the accuracy. Finally, the study explored the effectiveness and impact of two off-the-shelf feature selection techniques, \\n<italic>Boruta</i>\\n and \\n<italic>RFE</i>\\n, with the proposed framework. The results showed that the proposed framework has great potential for practitioners to accurately (with up to 100% precision, 66% recall, and 79% \\n<italic>F1</i>\\n-score) identify performance issues, with robustness to imbalanced data and good transferability to new projects and issue tracking platforms.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10504708\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10504708/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10504708/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports With Heuristic Linguistic Patterns
Software performance is critical for system efficiency, with performance issues potentially resulting in budget overruns, project delays, and market losses. Such problems are reported to developers through issue tracking systems, which are often under-tagged, as the manual tagging process is voluntary and time-consuming. Existing automated performance issue tagging techniques, such as keyword matching and machine/deep learning models, struggle due to imbalanced datasets and a high degree of variance. This paper presents a novel hybrid classification approach, combining Heuristic Linguistic Patterns (
HLP
s) with machine/deep learning models to enable practitioners to automatically identify performance-related issues. The proposed approach works across three progressive levels:
HLP
tagging, sentence tagging, and issue tagging, with a focus on linguistic analysis of issue descriptions. The authors evaluate the approach on three different datasets collected from different projects and issue-tracking platforms to prove that the proposed framework is accurate, project- and platform-agnostic, and robust to imbalanced datasets. Furthermore, this study also examined how the two unique techniques of the framework, including the fuzzy
HLP
matching and the
Issue HLP Matrix
, contribute to the accuracy. Finally, the study explored the effectiveness and impact of two off-the-shelf feature selection techniques,
Boruta
and
RFE
, with the proposed framework. The results showed that the proposed framework has great potential for practitioners to accurately (with up to 100% precision, 66% recall, and 79%
F1
-score) identify performance issues, with robustness to imbalanced data and good transferability to new projects and issue tracking platforms.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.