比较不平衡数据的ROC曲线和PRC曲线的案例研究

Proceedings of the Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society. Conference Pub Date : 2023-10-26 DOI:10.36001/phmconf.2023.v15i1.3479

Dan Watson, Karl Reichard, Aaron Isaacson

{"title":"比较不平衡数据的ROC曲线和PRC曲线的案例研究","authors":"Dan Watson, Karl Reichard, Aaron Isaacson","doi":"10.36001/phmconf.2023.v15i1.3479","DOIUrl":null,"url":null,"abstract":"Receiver operating characteristic curves are a mainstay in binary classification and have seen widespread use from their inception characterizing radar receivers in 1941. Widely used and accepted, the ROC curve is the default option for many application spaces. Building on prior work the Prognostics and Health Management community naturally adopted ROC curves to visualize classifier performance. While the ROC curve is perhaps the best known visualization of binary classifier performance it is not the only game in town. Authors from across various STEM fields have published works extolling various other metrics and visualizations in binary classifier performance evaluation. These include, but are not limited to, the precision recall characteristic curve, area under the curve metrics, bookmaker informedness and markedness. This paper will review these visualizations and metrics, provide references for more exhaustive treatments on them, and provide a case study of their use on an imbalanced prognostic health management data-set. Prognostic health management binary classification problems are often highly imbalanced with a low prevalence of positive (faulty) cases compared to negative (nominal/healthy) cases. In the presented data-set, time domain accelerometer data for a series of run-to-failure ball-on-disk scuffing tests provide a case where the vast majority of data, > 94%, is from nominally healthy data instances. A condition indicator algorithm targeting the hypothesized physical system response is validated compared to less informed classifiers. Several characteristic curves are then used to showcase the performance improvement of the physics informed condition indicator.","PeriodicalId":91951,"journal":{"name":"Proceedings of the Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society. Conference","volume":"29 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Case Study Comparing ROC and PRC Curves for Imbalanced Data\",\"authors\":\"Dan Watson, Karl Reichard, Aaron Isaacson\",\"doi\":\"10.36001/phmconf.2023.v15i1.3479\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Receiver operating characteristic curves are a mainstay in binary classification and have seen widespread use from their inception characterizing radar receivers in 1941. Widely used and accepted, the ROC curve is the default option for many application spaces. Building on prior work the Prognostics and Health Management community naturally adopted ROC curves to visualize classifier performance. While the ROC curve is perhaps the best known visualization of binary classifier performance it is not the only game in town. Authors from across various STEM fields have published works extolling various other metrics and visualizations in binary classifier performance evaluation. These include, but are not limited to, the precision recall characteristic curve, area under the curve metrics, bookmaker informedness and markedness. This paper will review these visualizations and metrics, provide references for more exhaustive treatments on them, and provide a case study of their use on an imbalanced prognostic health management data-set. Prognostic health management binary classification problems are often highly imbalanced with a low prevalence of positive (faulty) cases compared to negative (nominal/healthy) cases. In the presented data-set, time domain accelerometer data for a series of run-to-failure ball-on-disk scuffing tests provide a case where the vast majority of data, > 94%, is from nominally healthy data instances. A condition indicator algorithm targeting the hypothesized physical system response is validated compared to less informed classifiers. Several characteristic curves are then used to showcase the performance improvement of the physics informed condition indicator.\",\"PeriodicalId\":91951,\"journal\":{\"name\":\"Proceedings of the Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society. Conference\",\"volume\":\"29 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society. Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.36001/phmconf.2023.v15i1.3479\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society. Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36001/phmconf.2023.v15i1.3479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

接收机工作特性曲线是二元分类的中流砥柱，从1941年雷达接收机的特性开始就被广泛使用。ROC曲线被广泛使用和接受，是许多应用程序空间的默认选项。基于先前的工作，预后和健康管理社区自然采用ROC曲线来可视化分类器的性能。虽然ROC曲线可能是最著名的二进制分类器性能可视化，但它并不是唯一的游戏。来自不同STEM领域的作者已经发表了作品，颂扬了二元分类器性能评估中的各种其他指标和可视化。这些包括，但不限于，精确召回特征曲线，曲线下面积指标，庄家知情和标记。本文将回顾这些可视化和度量，为更详尽的治疗提供参考，并提供一个在不平衡预后健康管理数据集上使用它们的案例研究。预后健康管理的二元分类问题通常是高度不平衡的，与阴性(名义/健康)病例相比，阳性(缺陷)病例的患病率较低。在提出的数据集中，一系列从运行到失效的球盘磨损测试的时域加速度计数据提供了一种情况，其中绝大多数数据>94%来自名义上健康的数据实例。与不太知情的分类器相比，针对假设物理系统响应的条件指示算法得到了验证。然后使用几个特征曲线来展示物理通知状态指示器的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Case Study Comparing ROC and PRC Curves for Imbalanced Data

Receiver operating characteristic curves are a mainstay in binary classification and have seen widespread use from their inception characterizing radar receivers in 1941. Widely used and accepted, the ROC curve is the default option for many application spaces. Building on prior work the Prognostics and Health Management community naturally adopted ROC curves to visualize classifier performance. While the ROC curve is perhaps the best known visualization of binary classifier performance it is not the only game in town. Authors from across various STEM fields have published works extolling various other metrics and visualizations in binary classifier performance evaluation. These include, but are not limited to, the precision recall characteristic curve, area under the curve metrics, bookmaker informedness and markedness. This paper will review these visualizations and metrics, provide references for more exhaustive treatments on them, and provide a case study of their use on an imbalanced prognostic health management data-set. Prognostic health management binary classification problems are often highly imbalanced with a low prevalence of positive (faulty) cases compared to negative (nominal/healthy) cases. In the presented data-set, time domain accelerometer data for a series of run-to-failure ball-on-disk scuffing tests provide a case where the vast majority of data, > 94%, is from nominally healthy data instances. A condition indicator algorithm targeting the hypothesized physical system response is validated compared to less informed classifiers. Several characteristic curves are then used to showcase the performance improvement of the physics informed condition indicator.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society. Conference

自引率

0.00%

发文量