STAND: Data-Efficient and Self-Aware Precondition Induction for Interactive Task Learning

arXiv - CS - Machine Learning Pub Date : 2024-09-11 DOI:arxiv-2409.07653

Daniel Weitekamp, Kenneth Koedinger

{"title":"STAND: Data-Efficient and Self-Aware Precondition Induction for Interactive Task Learning","authors":"Daniel Weitekamp, Kenneth Koedinger","doi":"arxiv-2409.07653","DOIUrl":null,"url":null,"abstract":"STAND is a data-efficient and computationally efficient machine learning\napproach that produces better classification accuracy than popular approaches\nlike XGBoost on small-data tabular classification problems like learning rule\npreconditions from interactive training. STAND accounts for a complete set of\ngood candidate generalizations instead of selecting a single generalization by\nbreaking ties randomly. STAND can use any greedy concept construction strategy,\nlike decision tree learning or sequential covering, and build a structure that\napproximates a version space over disjunctive normal logical statements. Unlike\ncandidate elimination approaches to version-space learning, STAND does not\nsuffer from issues of version-space collapse from noisy data nor is it\nrestricted to learning strictly conjunctive concepts. More importantly, STAND\ncan produce a measure called instance certainty that can predict increases in\nholdout set performance and has high utility as an active-learning heuristic.\nInstance certainty enables STAND to be self-aware of its own learning: it knows\nwhen it learns and what example will help it learn the most. We illustrate that\ninstance certainty has desirable properties that can help users select next\ntraining problems, and estimate when training is complete in applications where\nusers interactively teach an AI a complex program.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"85 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

STAND is a data-efficient and computationally efficient machine learning approach that produces better classification accuracy than popular approaches like XGBoost on small-data tabular classification problems like learning rule preconditions from interactive training. STAND accounts for a complete set of good candidate generalizations instead of selecting a single generalization by breaking ties randomly. STAND can use any greedy concept construction strategy, like decision tree learning or sequential covering, and build a structure that approximates a version space over disjunctive normal logical statements. Unlike candidate elimination approaches to version-space learning, STAND does not suffer from issues of version-space collapse from noisy data nor is it restricted to learning strictly conjunctive concepts. More importantly, STAND can produce a measure called instance certainty that can predict increases in holdout set performance and has high utility as an active-learning heuristic. Instance certainty enables STAND to be self-aware of its own learning: it knows when it learns and what example will help it learn the most. We illustrate that instance certainty has desirable properties that can help users select next training problems, and estimate when training is complete in applications where users interactively teach an AI a complex program.

查看原文本刊更多论文

STAND：针对交互式任务学习的数据高效和自我意识前提条件诱导

STAND 是一种数据效率高、计算效率高的机器学习方法，与 XGBoost 等流行方法相比，它在小数据表格分类问题（如从交互式训练中学习规则条件）上的分类准确率更高。STAND 考虑了一整套良好的候选概括，而不是通过随机断开并列关系来选择单一概括。STAND 可以使用任何贪婪概念构建策略（如决策树学习或顺序覆盖），并构建一个近似于非结正则逻辑语句版本空间的结构。与版本空间学习中的候选消除方法不同，STAND 不存在版本空间因噪声数据而崩溃的问题，也不局限于学习严格的连接概念。更重要的是，STAND 能够产生一种称为实例确定性的度量，这种度量可以预测holdout集性能的提高，并且作为一种主动学习启发式具有很高的实用性。我们说明了实例确定性具有理想的特性，可以帮助用户选择下一个训练问题，并在用户交互式地向人工智能教授复杂程序的应用中估计训练何时完成。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Machine Learning

自引率

0.00%

发文量