Heterogeneous learner for Web page classification

Hwanjo Yu, K. Chang, Jiawei Han
{"title":"Heterogeneous learner for Web page classification","authors":"Hwanjo Yu, K. Chang, Jiawei Han","doi":"10.1109/ICDM.2002.1183999","DOIUrl":null,"url":null,"abstract":"Classification of an interesting class of Web pages has been an interesting problem. Typical machine learning algorithms for this problem require two classes of data for training: positive and negative training examples. However in application to Web page classification, gathering an unbiased sample of negative examples appears to be difficult. We propose a heterogeneous learning framework for classifying Web pages, which (1) eliminates the need for negative training data, and (2) increases classification accuracy by using two heterogeneous learners. Our framework uses two heterogeneous learners-a decision list and a linear separator which complement each other-to eliminate the need for negative training data in the training phase and to increase the accuracy in the testing phase. Our results show that our heterogeneous framework achieves high accuracy without requiring negative training data; it enhances the accuracy of linear separators by reducing the errors on \"low-margin data\". That is, it classifies more accurately while requiring less human efforts in training.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2002.1183999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

Classification of an interesting class of Web pages has been an interesting problem. Typical machine learning algorithms for this problem require two classes of data for training: positive and negative training examples. However in application to Web page classification, gathering an unbiased sample of negative examples appears to be difficult. We propose a heterogeneous learning framework for classifying Web pages, which (1) eliminates the need for negative training data, and (2) increases classification accuracy by using two heterogeneous learners. Our framework uses two heterogeneous learners-a decision list and a linear separator which complement each other-to eliminate the need for negative training data in the training phase and to increase the accuracy in the testing phase. Our results show that our heterogeneous framework achieves high accuracy without requiring negative training data; it enhances the accuracy of linear separators by reducing the errors on "low-margin data". That is, it classifies more accurately while requiring less human efforts in training.
网页分类的异构学习器
对一类有趣的Web页面进行分类是一个有趣的问题。典型的机器学习算法需要两类数据进行训练:正训练样本和负训练样本。然而,在网页分类的应用中,收集一个无偏的负例样本似乎是困难的。我们提出了一个异构学习框架来分类网页,它(1)消除了对负训练数据的需要,(2)通过使用两个异构学习器来提高分类精度。我们的框架使用两个异构学习器——一个决策列表和一个相互补充的线性分隔符——以消除训练阶段对负训练数据的需求,并提高测试阶段的准确性。结果表明,我们的异构框架在不需要负训练数据的情况下获得了较高的准确率;它通过减少“低边际数据”上的误差,提高了线性分割器的精度。也就是说,它可以更准确地分类,同时在训练中需要更少的人力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信