{"title":"网页分类中体裁属性的检验","authors":"Lei Dong, C. Watters, Jack Duffy, M. Shepherd","doi":"10.1109/HICSS.2008.53","DOIUrl":null,"url":null,"abstract":"In this paper, we describe a set of experiments to examine the effect of various attributes of web genre on the automatic identification of the genre of web pages. Four different genres are used in the data set, namely, FAQ, News, E-Shopping and Personal Home Pages. The effects of the number of features used to represent the web pages (5, 20, or 100) as well as the types of attributes, <content, form, functionality>, singly and in various combinations are examined. The results indicate that fewer features produce better precision but more features produce better recall, and that attributes in combinations will always perform better than single attributes.","PeriodicalId":328874,"journal":{"name":"Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"An Examination of Genre Attributes for Web Page Classification\",\"authors\":\"Lei Dong, C. Watters, Jack Duffy, M. Shepherd\",\"doi\":\"10.1109/HICSS.2008.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we describe a set of experiments to examine the effect of various attributes of web genre on the automatic identification of the genre of web pages. Four different genres are used in the data set, namely, FAQ, News, E-Shopping and Personal Home Pages. The effects of the number of features used to represent the web pages (5, 20, or 100) as well as the types of attributes, <content, form, functionality>, singly and in various combinations are examined. The results indicate that fewer features produce better precision but more features produce better recall, and that attributes in combinations will always perform better than single attributes.\",\"PeriodicalId\":328874,\"journal\":{\"name\":\"Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-01-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HICSS.2008.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HICSS.2008.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 40
摘要
在本文中,我们描述了一组实验,以检验网页类型的各种属性对网页类型自动识别的影响。数据集中使用了四种不同的类型,分别是FAQ、News、E-Shopping和Personal Home Pages。用于表示网页(5个、20个或100个)的特征数量的影响,以及属性的类型,单个和各种组合进行了检查。结果表明,特征越少,准确率越高,特征越多,召回率越高,组合属性总是比单一属性表现得更好。
An Examination of Genre Attributes for Web Page Classification
In this paper, we describe a set of experiments to examine the effect of various attributes of web genre on the automatic identification of the genre of web pages. Four different genres are used in the data set, namely, FAQ, News, E-Shopping and Personal Home Pages. The effects of the number of features used to represent the web pages (5, 20, or 100) as well as the types of attributes, , singly and in various combinations are examined. The results indicate that fewer features produce better precision but more features produce better recall, and that attributes in combinations will always perform better than single attributes.