使用基于同义词库的方法对网站进行分类

S. Pudaruth, Youven Ankiah, Keshav Sembhoo
{"title":"使用基于同义词库的方法对网站进行分类","authors":"S. Pudaruth, Youven Ankiah, Keshav Sembhoo","doi":"10.1109/IC3.2014.6897245","DOIUrl":null,"url":null,"abstract":"With the increasing number of Mauritian-owned websites on the internet, the need for classification is becoming highly important. Our objective in this research is to classify a list of websites into seven broad categories namely education, entertainment, government, health, tourism, sports and shopping. The homepage of three hundred and nineteen websites have been used in this study. We have exploited the rich source of information (features) contained in the homepage like the meta tags, title tag, heading tags, hyperlinks, the content of the website and the domain name of the website. These information were then used to classify the websites into their most appropriate category. Several parameters like the weight applied to each feature and the keywords used to classify the websites were tuned to yield better results. The experimental evaluation revealed that the method implemented provides very high accuracy. In particularly, we obtained an accuracy of about 95% which is higher than all existing approaches considered so far in the research literature.","PeriodicalId":444918,"journal":{"name":"2014 Seventh International Conference on Contemporary Computing (IC3)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Using a thesaurus-based approach for the categorisation of web sites\",\"authors\":\"S. Pudaruth, Youven Ankiah, Keshav Sembhoo\",\"doi\":\"10.1109/IC3.2014.6897245\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing number of Mauritian-owned websites on the internet, the need for classification is becoming highly important. Our objective in this research is to classify a list of websites into seven broad categories namely education, entertainment, government, health, tourism, sports and shopping. The homepage of three hundred and nineteen websites have been used in this study. We have exploited the rich source of information (features) contained in the homepage like the meta tags, title tag, heading tags, hyperlinks, the content of the website and the domain name of the website. These information were then used to classify the websites into their most appropriate category. Several parameters like the weight applied to each feature and the keywords used to classify the websites were tuned to yield better results. The experimental evaluation revealed that the method implemented provides very high accuracy. In particularly, we obtained an accuracy of about 95% which is higher than all existing approaches considered so far in the research literature.\",\"PeriodicalId\":444918,\"journal\":{\"name\":\"2014 Seventh International Conference on Contemporary Computing (IC3)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Seventh International Conference on Contemporary Computing (IC3)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3.2014.6897245\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Seventh International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2014.6897245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

随着互联网上毛里求斯人拥有的网站越来越多,分类的需求变得非常重要。我们在这项研究中的目标是将网站列表分为七大类,即教育,娱乐,政府,健康,旅游,体育和购物。本研究使用了319个网站的主页。我们利用了主页中包含的丰富的信息来源(特征),如元标签、标题标签、标题标签、超链接、网站内容和网站域名。然后使用这些信息将网站划分为最合适的类别。一些参数,如应用于每个特征的权重和用于对网站进行分类的关键字进行了调整,以产生更好的结果。实验结果表明,所实现的方法具有很高的精度。特别是,我们获得了约95%的准确性,这比迄今为止研究文献中考虑的所有现有方法都要高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using a thesaurus-based approach for the categorisation of web sites
With the increasing number of Mauritian-owned websites on the internet, the need for classification is becoming highly important. Our objective in this research is to classify a list of websites into seven broad categories namely education, entertainment, government, health, tourism, sports and shopping. The homepage of three hundred and nineteen websites have been used in this study. We have exploited the rich source of information (features) contained in the homepage like the meta tags, title tag, heading tags, hyperlinks, the content of the website and the domain name of the website. These information were then used to classify the websites into their most appropriate category. Several parameters like the weight applied to each feature and the keywords used to classify the websites were tuned to yield better results. The experimental evaluation revealed that the method implemented provides very high accuracy. In particularly, we obtained an accuracy of about 95% which is higher than all existing approaches considered so far in the research literature.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信