{"title":"二进制分类器错位的逻辑警报","authors":"Andrés Corrada-Emmanuel, Ilya Parker, Ramesh Bharadwaj","doi":"arxiv-2409.11052","DOIUrl":null,"url":null,"abstract":"If two agents disagree in their decisions, we may suspect they are not both\ncorrect. This intuition is formalized for evaluating agents that have carried\nout a binary classification task. Their agreements and disagreements on a joint\ntest allow us to establish the only group evaluations logically consistent with\ntheir responses. This is done by establishing a set of axioms (algebraic\nrelations) that must be universally obeyed by all evaluations of binary\nresponders. A complete set of such axioms are possible for each ensemble of\nsize N. The axioms for $N = 1, 2$ are used to construct a fully logical alarm -\none that can prove that at least one ensemble member is malfunctioning using\nonly unlabeled data. The similarities of this approach to formal software\nverification and its utility for recent agendas of safe guaranteed AI are\ndiscussed.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A logical alarm for misaligned binary classifiers\",\"authors\":\"Andrés Corrada-Emmanuel, Ilya Parker, Ramesh Bharadwaj\",\"doi\":\"arxiv-2409.11052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"If two agents disagree in their decisions, we may suspect they are not both\\ncorrect. This intuition is formalized for evaluating agents that have carried\\nout a binary classification task. Their agreements and disagreements on a joint\\ntest allow us to establish the only group evaluations logically consistent with\\ntheir responses. This is done by establishing a set of axioms (algebraic\\nrelations) that must be universally obeyed by all evaluations of binary\\nresponders. A complete set of such axioms are possible for each ensemble of\\nsize N. The axioms for $N = 1, 2$ are used to construct a fully logical alarm -\\none that can prove that at least one ensemble member is malfunctioning using\\nonly unlabeled data. The similarities of this approach to formal software\\nverification and its utility for recent agendas of safe guaranteed AI are\\ndiscussed.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11052\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
If two agents disagree in their decisions, we may suspect they are not both
correct. This intuition is formalized for evaluating agents that have carried
out a binary classification task. Their agreements and disagreements on a joint
test allow us to establish the only group evaluations logically consistent with
their responses. This is done by establishing a set of axioms (algebraic
relations) that must be universally obeyed by all evaluations of binary
responders. A complete set of such axioms are possible for each ensemble of
size N. The axioms for $N = 1, 2$ are used to construct a fully logical alarm -
one that can prove that at least one ensemble member is malfunctioning using
only unlabeled data. The similarities of this approach to formal software
verification and its utility for recent agendas of safe guaranteed AI are
discussed.