懒惰学习者改进基于概率排序的决策树

H. Liang, Yuhong Yan
{"title":"懒惰学习者改进基于概率排序的决策树","authors":"H. Liang, Yuhong Yan","doi":"10.1109/ICTAI.2006.65","DOIUrl":null,"url":null,"abstract":"Existing work shows that classic decision trees have inherent deficiencies in obtaining a good probability-based ranking (e.g. AUC). This paper aims to improve the ranking performance under decision-tree paradigms by presenting two new models. The intuition behind our work is that probability-based ranking is a relative metric among samples, therefore, distinct probabilities are crucial for accurate ranking. The first model, lazy distance-based tree (LDTree), uses a lazy learner at each leaf to explicitly distinguish the different contributions of leaf samples when estimating the probabilities for an unlabeled sample. The second model, eager distance-based tree (EDTree), improves LDTree by changing it into an eager algorithm. In both models, each unlabeled sample is assigned a set of unique probabilities of class membership instead of a set of uniformed ones, which gives finer resolution to differentiate samples and leads to the improvement of ranking. On 34 UCI sample sets, experiments verify that our models greatly outperform C4.5, C4.4 and other standard smoothing methods designed for better ranking","PeriodicalId":169424,"journal":{"name":"2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Improve Decision Trees for Probability-Based Ranking by Lazy Learners\",\"authors\":\"H. Liang, Yuhong Yan\",\"doi\":\"10.1109/ICTAI.2006.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing work shows that classic decision trees have inherent deficiencies in obtaining a good probability-based ranking (e.g. AUC). This paper aims to improve the ranking performance under decision-tree paradigms by presenting two new models. The intuition behind our work is that probability-based ranking is a relative metric among samples, therefore, distinct probabilities are crucial for accurate ranking. The first model, lazy distance-based tree (LDTree), uses a lazy learner at each leaf to explicitly distinguish the different contributions of leaf samples when estimating the probabilities for an unlabeled sample. The second model, eager distance-based tree (EDTree), improves LDTree by changing it into an eager algorithm. In both models, each unlabeled sample is assigned a set of unique probabilities of class membership instead of a set of uniformed ones, which gives finer resolution to differentiate samples and leads to the improvement of ranking. On 34 UCI sample sets, experiments verify that our models greatly outperform C4.5, C4.4 and other standard smoothing methods designed for better ranking\",\"PeriodicalId\":169424,\"journal\":{\"name\":\"2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06)\",\"volume\":\"212 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI.2006.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2006.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

现有研究表明,经典决策树在获得良好的基于概率的排序(例如AUC)方面存在固有缺陷。为了提高决策树模式下的排序性能,本文提出了两个新模型。我们工作背后的直觉是,基于概率的排名是样本之间的相对度量,因此,不同的概率对于准确的排名至关重要。第一个模型是基于惰性距离的树(LDTree),它在估计未标记样本的概率时,在每个叶子上使用一个惰性学习器来明确区分叶子样本的不同贡献。第二种模型是基于渴望距离的树(EDTree),它将LDTree模型改进为一种渴望算法。在这两种模型中,每个未标记的样本都被分配了一组唯一的类别隶属度概率,而不是一组统一的概率,这样可以更好地区分样本,从而提高排名。在34个UCI样本集上,实验验证了我们的模型大大优于C4.5, C4.4和其他为更好排名而设计的标准平滑方法
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improve Decision Trees for Probability-Based Ranking by Lazy Learners
Existing work shows that classic decision trees have inherent deficiencies in obtaining a good probability-based ranking (e.g. AUC). This paper aims to improve the ranking performance under decision-tree paradigms by presenting two new models. The intuition behind our work is that probability-based ranking is a relative metric among samples, therefore, distinct probabilities are crucial for accurate ranking. The first model, lazy distance-based tree (LDTree), uses a lazy learner at each leaf to explicitly distinguish the different contributions of leaf samples when estimating the probabilities for an unlabeled sample. The second model, eager distance-based tree (EDTree), improves LDTree by changing it into an eager algorithm. In both models, each unlabeled sample is assigned a set of unique probabilities of class membership instead of a set of uniformed ones, which gives finer resolution to differentiate samples and leads to the improvement of ranking. On 34 UCI sample sets, experiments verify that our models greatly outperform C4.5, C4.4 and other standard smoothing methods designed for better ranking
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信