信任分类：基于序列椭圆分区的监督方法

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2023-12-25 DOI:10.1109/TKDE.2023.3345658

Ranjani Niranjan;Sachit Rao

{"title":"信任分类：基于序列椭圆分区的监督方法","authors":"Ranjani Niranjan;Sachit Rao","doi":"10.1109/TKDE.2023.3345658","DOIUrl":null,"url":null,"abstract":"Standard metrics of performance of classifiers, such as accuracy and sensitivity, do not reveal the trust or confidence in the predicted labels of data. While other metrics such as the computed probability of a label or the signed distance from a hyperplane can act as a trust measure, these are subjected to heuristic thresholds. This paper presents a convex optimization-based supervised classifier that sequentially partitions a dataset into several ellipsoids, where each ellipsoid contains nearly all points of the same label. By stating classification rules based on this partitioning, Bayes’ formula is then applied to calculate a trust score to a label assigned to a test datapoint determined from these rules. The proposed Sequential Ellipsoidal Partitioning Classifier (SEP-C) exposes dataset irregularities, such as degree of overlap, without requiring a separate exploratory data analysis. The rules of classification, which are free of hyperparameters, are also not affected by class-imbalance, the underlying data distribution, or number of features. SEP-C does not require the use of non-linear kernels when the dataset is not linearly separable. The performance, and comparison with other methods, of SEP-C is demonstrated on the XOR-problem, circle dataset, and other open-source datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7757-7771"},"PeriodicalIF":8.9000,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classification With Trust: A Supervised Approach Based on Sequential Ellipsoidal Partitioning\",\"authors\":\"Ranjani Niranjan;Sachit Rao\",\"doi\":\"10.1109/TKDE.2023.3345658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Standard metrics of performance of classifiers, such as accuracy and sensitivity, do not reveal the trust or confidence in the predicted labels of data. While other metrics such as the computed probability of a label or the signed distance from a hyperplane can act as a trust measure, these are subjected to heuristic thresholds. This paper presents a convex optimization-based supervised classifier that sequentially partitions a dataset into several ellipsoids, where each ellipsoid contains nearly all points of the same label. By stating classification rules based on this partitioning, Bayes’ formula is then applied to calculate a trust score to a label assigned to a test datapoint determined from these rules. The proposed Sequential Ellipsoidal Partitioning Classifier (SEP-C) exposes dataset irregularities, such as degree of overlap, without requiring a separate exploratory data analysis. The rules of classification, which are free of hyperparameters, are also not affected by class-imbalance, the underlying data distribution, or number of features. SEP-C does not require the use of non-linear kernels when the dataset is not linearly separable. The performance, and comparison with other methods, of SEP-C is demonstrated on the XOR-problem, circle dataset, and other open-source datasets.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"36 12\",\"pages\":\"7757-7771\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2023-12-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10373975/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10373975/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

分类器的标准性能指标，如准确度和灵敏度，并不能揭示对数据预测标签的信任或信心。虽然标签的计算概率或与超平面的签名距离等其他指标可以作为信任度指标，但这些指标都受启发式阈值的限制。本文提出了一种基于凸优化的监督分类器，它能将数据集按顺序分割成多个椭球体，其中每个椭球体几乎都包含相同标签的所有点。在此基础上说明分类规则，然后应用贝叶斯公式计算根据这些规则确定的测试数据点的信任分值。所提出的序列椭圆分区分类器（SEP-C）可揭示数据集的不规则性，如重叠程度，而无需进行单独的探索性数据分析。分类规则不受超参数影响，也不受类别不平衡、基础数据分布或特征数量的影响。当数据集不可线性分离时，SEP-C 不需要使用非线性核。SEP-C 的性能以及与其他方法的比较在 XOR 问题、圆圈数据集和其他开源数据集上得到了验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classification With Trust: A Supervised Approach Based on Sequential Ellipsoidal Partitioning

Standard metrics of performance of classifiers, such as accuracy and sensitivity, do not reveal the trust or confidence in the predicted labels of data. While other metrics such as the computed probability of a label or the signed distance from a hyperplane can act as a trust measure, these are subjected to heuristic thresholds. This paper presents a convex optimization-based supervised classifier that sequentially partitions a dataset into several ellipsoids, where each ellipsoid contains nearly all points of the same label. By stating classification rules based on this partitioning, Bayes’ formula is then applied to calculate a trust score to a label assigned to a test datapoint determined from these rules. The proposed Sequential Ellipsoidal Partitioning Classifier (SEP-C) exposes dataset irregularities, such as degree of overlap, without requiring a separate exploratory data analysis. The rules of classification, which are free of hyperparameters, are also not affected by class-imbalance, the underlying data distribution, or number of features. SEP-C does not require the use of non-linear kernels when the dataset is not linearly separable. The performance, and comparison with other methods, of SEP-C is demonstrated on the XOR-problem, circle dataset, and other open-source datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.