网页吸引力分类:一种监督学习方法

Ganesh Khade, Sudhakar Kumar, S. Bhattacharya
{"title":"网页吸引力分类:一种监督学习方法","authors":"Ganesh Khade, Sudhakar Kumar, S. Bhattacharya","doi":"10.1109/IHCI.2012.6481867","DOIUrl":null,"url":null,"abstract":"Random surfers spend very little time on a web page. If the most important web page content fails to attract his attention within the short time span, he will move away to some other page, thus defeating the purpose of the web page designer. In order to predict if the contents of a web page will catch a random surfer's attention or not, we propose a machine learning based approach to classify web pages into “bad” and “not bad” classes, where the “bad” class implies poor attention drawing ability. We propose to divide web page contents into “objects”, which are coherent regions of web page conveying the same information, to develop the classifier approach. We surveyed 100 web pages sampled from the Internet to identify the type and frequency of objects used in web page design. From our survey, we identified six types of objects that are most important in determining the class of a web page, in terms of its attention drawing capability. We used the WEKA tool to implement the machine learning approach. Two different strategies of percentage split and three different strategies of cross validation are used to check for accuracy of the classifier. We have experimented with 65 algorithms supported by WEKA and found that the algorithms RBF network and Random subspace, among the 65, gives the best performance, with about 83% accuracy.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Classification of web pages on attractiveness: A supervised learning approach\",\"authors\":\"Ganesh Khade, Sudhakar Kumar, S. Bhattacharya\",\"doi\":\"10.1109/IHCI.2012.6481867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Random surfers spend very little time on a web page. If the most important web page content fails to attract his attention within the short time span, he will move away to some other page, thus defeating the purpose of the web page designer. In order to predict if the contents of a web page will catch a random surfer's attention or not, we propose a machine learning based approach to classify web pages into “bad” and “not bad” classes, where the “bad” class implies poor attention drawing ability. We propose to divide web page contents into “objects”, which are coherent regions of web page conveying the same information, to develop the classifier approach. We surveyed 100 web pages sampled from the Internet to identify the type and frequency of objects used in web page design. From our survey, we identified six types of objects that are most important in determining the class of a web page, in terms of its attention drawing capability. We used the WEKA tool to implement the machine learning approach. Two different strategies of percentage split and three different strategies of cross validation are used to check for accuracy of the classifier. We have experimented with 65 algorithms supported by WEKA and found that the algorithms RBF network and Random subspace, among the 65, gives the best performance, with about 83% accuracy.\",\"PeriodicalId\":107245,\"journal\":{\"name\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHCI.2012.6481867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

随机的冲浪者在网页上花费的时间很少。如果最重要的网页内容不能在短时间内吸引他的注意力,他就会转移到其他页面,从而违背了网页设计师的目的。为了预测网页的内容是否会引起随机浏览者的注意,我们提出了一种基于机器学习的方法,将网页分为“坏”和“不坏”两类,其中“坏”类意味着较差的注意力吸引能力。我们建议将网页内容划分为“对象”,这些“对象”是网页中传递相同信息的连贯区域,以发展分类器方法。我们从互联网上抽样调查了100个网页,以确定网页设计中使用的对象的类型和频率。从我们的调查中,我们确定了六种类型的对象,它们在决定网页的类别时最重要,就其吸引注意力的能力而言。我们使用WEKA工具来实现机器学习方法。使用两种不同的百分比分割策略和三种不同的交叉验证策略来检查分类器的准确性。我们对WEKA支持的65种算法进行了实验,发现在这65种算法中,RBF网络和Random子空间算法的性能最好,准确率约为83%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Classification of web pages on attractiveness: A supervised learning approach
Random surfers spend very little time on a web page. If the most important web page content fails to attract his attention within the short time span, he will move away to some other page, thus defeating the purpose of the web page designer. In order to predict if the contents of a web page will catch a random surfer's attention or not, we propose a machine learning based approach to classify web pages into “bad” and “not bad” classes, where the “bad” class implies poor attention drawing ability. We propose to divide web page contents into “objects”, which are coherent regions of web page conveying the same information, to develop the classifier approach. We surveyed 100 web pages sampled from the Internet to identify the type and frequency of objects used in web page design. From our survey, we identified six types of objects that are most important in determining the class of a web page, in terms of its attention drawing capability. We used the WEKA tool to implement the machine learning approach. Two different strategies of percentage split and three different strategies of cross validation are used to check for accuracy of the classifier. We have experimented with 65 algorithms supported by WEKA and found that the algorithms RBF network and Random subspace, among the 65, gives the best performance, with about 83% accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信