改进对不平衡数据集的日志预测:一个开源Java项目的案例研究

Q4 Computer Science

International Journal of Open Source Software and Processes Pub Date : 2016-04-01 DOI:10.4018/IJOSSP.2016040103

Sangeeta Lal, Neetu Sardana, A. Sureka

{"title":"改进对不平衡数据集的日志预测:一个开源Java项目的案例研究","authors":"Sangeeta Lal, Neetu Sardana, A. Sureka","doi":"10.4018/IJOSSP.2016040103","DOIUrl":null,"url":null,"abstract":"Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.","PeriodicalId":53605,"journal":{"name":"International Journal of Open Source Software and Processes","volume":"72 1","pages":"43-71"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Improving Logging Prediction on Imbalanced Datasets: A Case Study on Open Source Java Projects\",\"authors\":\"Sangeeta Lal, Neetu Sardana, A. Sureka\",\"doi\":\"10.4018/IJOSSP.2016040103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.\",\"PeriodicalId\":53605,\"journal\":{\"name\":\"International Journal of Open Source Software and Processes\",\"volume\":\"72 1\",\"pages\":\"43-71\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Open Source Software and Processes\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJOSSP.2016040103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Open Source Software and Processes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJOSSP.2016040103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 5

摘要

日志记录对于OSS开发人员来说是一个重要而艰难的决定。机器学习模型在改进OSS开发的几个步骤(包括日志记录)方面很有用。最近的一些研究提出了机器学习模型来预测日志代码结构。由于类不平衡问题，这些模型的预测性能受到限制，因为与非日志代码构造相比，日志代码构造的数量很少。对于日志代码结构预测的类不平衡问题，目前尚无相关研究。作者首先分析了J48、RF和SVM分类器在捕获块和if块日志代码构造预测不平衡数据集上的性能。其次，作者提出了LogIm，一个集成和基于阈值的机器学习模型。第三，作者评估了LogIm在三个开源项目上的性能。平均而言，LogIm模型将基线分类器J48、RF和SVM的性能提高了7.38%、9.24%和4.6%，在catch块上提高了7.38%、9.24%和4.6%，在if块上提高了12.11%、14.95%和19.13%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Logging Prediction on Imbalanced Datasets: A Case Study on Open Source Java Projects

Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Open Source Software and Processes Computer Science-Software

CiteScore

1.90

自引率

0.00%

发文量

期刊介绍： The International Journal of Open Source Software and Processes (IJOSSP) publishes high-quality peer-reviewed and original research articles on the large field of open source software and processes. This wide area entails many intriguing question and facets, including the special development process performed by a large number of geographically dispersed programmers, community issues like coordination and communication, motivations of the participants, and also economic and legal issues. Beyond this topic, open source software is an example of a highly distributed innovation process led by the users. Therefore, many aspects have relevance beyond the realm of software and its development. In this tradition, IJOSSP also publishes papers on these topics. IJOSSP is a multi-disciplinary outlet, and welcomes submissions from all relevant fields of research and applying a multitude of research approaches.