{"title":"Classification With Trust: A Supervised Approach Based on Sequential Ellipsoidal Partitioning","authors":"Ranjani Niranjan;Sachit Rao","doi":"10.1109/TKDE.2023.3345658","DOIUrl":null,"url":null,"abstract":"Standard metrics of performance of classifiers, such as accuracy and sensitivity, do not reveal the trust or confidence in the predicted labels of data. While other metrics such as the computed probability of a label or the signed distance from a hyperplane can act as a trust measure, these are subjected to heuristic thresholds. This paper presents a convex optimization-based supervised classifier that sequentially partitions a dataset into several ellipsoids, where each ellipsoid contains nearly all points of the same label. By stating classification rules based on this partitioning, Bayes’ formula is then applied to calculate a trust score to a label assigned to a test datapoint determined from these rules. The proposed Sequential Ellipsoidal Partitioning Classifier (SEP-C) exposes dataset irregularities, such as degree of overlap, without requiring a separate exploratory data analysis. The rules of classification, which are free of hyperparameters, are also not affected by class-imbalance, the underlying data distribution, or number of features. SEP-C does not require the use of non-linear kernels when the dataset is not linearly separable. The performance, and comparison with other methods, of SEP-C is demonstrated on the XOR-problem, circle dataset, and other open-source datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7757-7771"},"PeriodicalIF":8.9000,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10373975/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Standard metrics of performance of classifiers, such as accuracy and sensitivity, do not reveal the trust or confidence in the predicted labels of data. While other metrics such as the computed probability of a label or the signed distance from a hyperplane can act as a trust measure, these are subjected to heuristic thresholds. This paper presents a convex optimization-based supervised classifier that sequentially partitions a dataset into several ellipsoids, where each ellipsoid contains nearly all points of the same label. By stating classification rules based on this partitioning, Bayes’ formula is then applied to calculate a trust score to a label assigned to a test datapoint determined from these rules. The proposed Sequential Ellipsoidal Partitioning Classifier (SEP-C) exposes dataset irregularities, such as degree of overlap, without requiring a separate exploratory data analysis. The rules of classification, which are free of hyperparameters, are also not affected by class-imbalance, the underlying data distribution, or number of features. SEP-C does not require the use of non-linear kernels when the dataset is not linearly separable. The performance, and comparison with other methods, of SEP-C is demonstrated on the XOR-problem, circle dataset, and other open-source datasets.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.