{"title":"STAND: Data-Efficient and Self-Aware Precondition Induction for Interactive Task Learning","authors":"Daniel Weitekamp, Kenneth Koedinger","doi":"arxiv-2409.07653","DOIUrl":null,"url":null,"abstract":"STAND is a data-efficient and computationally efficient machine learning\napproach that produces better classification accuracy than popular approaches\nlike XGBoost on small-data tabular classification problems like learning rule\npreconditions from interactive training. STAND accounts for a complete set of\ngood candidate generalizations instead of selecting a single generalization by\nbreaking ties randomly. STAND can use any greedy concept construction strategy,\nlike decision tree learning or sequential covering, and build a structure that\napproximates a version space over disjunctive normal logical statements. Unlike\ncandidate elimination approaches to version-space learning, STAND does not\nsuffer from issues of version-space collapse from noisy data nor is it\nrestricted to learning strictly conjunctive concepts. More importantly, STAND\ncan produce a measure called instance certainty that can predict increases in\nholdout set performance and has high utility as an active-learning heuristic.\nInstance certainty enables STAND to be self-aware of its own learning: it knows\nwhen it learns and what example will help it learn the most. We illustrate that\ninstance certainty has desirable properties that can help users select next\ntraining problems, and estimate when training is complete in applications where\nusers interactively teach an AI a complex program.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
STAND is a data-efficient and computationally efficient machine learning
approach that produces better classification accuracy than popular approaches
like XGBoost on small-data tabular classification problems like learning rule
preconditions from interactive training. STAND accounts for a complete set of
good candidate generalizations instead of selecting a single generalization by
breaking ties randomly. STAND can use any greedy concept construction strategy,
like decision tree learning or sequential covering, and build a structure that
approximates a version space over disjunctive normal logical statements. Unlike
candidate elimination approaches to version-space learning, STAND does not
suffer from issues of version-space collapse from noisy data nor is it
restricted to learning strictly conjunctive concepts. More importantly, STAND
can produce a measure called instance certainty that can predict increases in
holdout set performance and has high utility as an active-learning heuristic.
Instance certainty enables STAND to be self-aware of its own learning: it knows
when it learns and what example will help it learn the most. We illustrate that
instance certainty has desirable properties that can help users select next
training problems, and estimate when training is complete in applications where
users interactively teach an AI a complex program.