{"title":"网页中新兴类型的识别和分类","authors":"K. Kumari, A. Reddy","doi":"10.1109/ICCCT2.2014.7066692","DOIUrl":null,"url":null,"abstract":"The information in World Wide Web is dynamic and growing faster. Existing topic based search engines are not adequate to retrieve information required by the users. So there is a necessity to develop genre based search engines. Firstly, web genres have to be identified to develop genre based search engines. Presently, there exist a few genre corpuses which include web genres like articles, online news, journalistic etc. The active nature of the web allows new genres to come into existence and these genres are called as emerging genres. In this paper, two novel algorithms are proposed namely Identification of Emerging Genres (IEG) algorithm and Adjustable Centroid Classification (ACC) algorithm. The IEG algorithm is used to identify emerging genres from the web pages that are collected randomly from the web and ACC algorithm is used to evaluate the performance of genre corpus. In this paper, the IEG algorithm has identified three emerging genres from 339 randomly selected web pages from World Wide Web by considering balanced 7-genre corpus for single label and unbalanced 20-genre corpus for multi-label respectively. The performance of the resultant datasets (10-genre single label and 23-genre multi-label) obtained during the identification process is evaluated using ACC algorithm and compared with SVM classifier, random forest classifier for single label classification and binary relevance random forest classifier, binary relevance SVM classifier for multi-label classification respectively. The classification results show that ACC algorithm gave better results when compared to existing classification algorithms.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"48 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification and classification of emerging genres in WebPages\",\"authors\":\"K. Kumari, A. Reddy\",\"doi\":\"10.1109/ICCCT2.2014.7066692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The information in World Wide Web is dynamic and growing faster. Existing topic based search engines are not adequate to retrieve information required by the users. So there is a necessity to develop genre based search engines. Firstly, web genres have to be identified to develop genre based search engines. Presently, there exist a few genre corpuses which include web genres like articles, online news, journalistic etc. The active nature of the web allows new genres to come into existence and these genres are called as emerging genres. In this paper, two novel algorithms are proposed namely Identification of Emerging Genres (IEG) algorithm and Adjustable Centroid Classification (ACC) algorithm. The IEG algorithm is used to identify emerging genres from the web pages that are collected randomly from the web and ACC algorithm is used to evaluate the performance of genre corpus. In this paper, the IEG algorithm has identified three emerging genres from 339 randomly selected web pages from World Wide Web by considering balanced 7-genre corpus for single label and unbalanced 20-genre corpus for multi-label respectively. The performance of the resultant datasets (10-genre single label and 23-genre multi-label) obtained during the identification process is evaluated using ACC algorithm and compared with SVM classifier, random forest classifier for single label classification and binary relevance random forest classifier, binary relevance SVM classifier for multi-label classification respectively. The classification results show that ACC algorithm gave better results when compared to existing classification algorithms.\",\"PeriodicalId\":6860,\"journal\":{\"name\":\"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)\",\"volume\":\"48 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCT2.2014.7066692\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCT2.2014.7066692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identification and classification of emerging genres in WebPages
The information in World Wide Web is dynamic and growing faster. Existing topic based search engines are not adequate to retrieve information required by the users. So there is a necessity to develop genre based search engines. Firstly, web genres have to be identified to develop genre based search engines. Presently, there exist a few genre corpuses which include web genres like articles, online news, journalistic etc. The active nature of the web allows new genres to come into existence and these genres are called as emerging genres. In this paper, two novel algorithms are proposed namely Identification of Emerging Genres (IEG) algorithm and Adjustable Centroid Classification (ACC) algorithm. The IEG algorithm is used to identify emerging genres from the web pages that are collected randomly from the web and ACC algorithm is used to evaluate the performance of genre corpus. In this paper, the IEG algorithm has identified three emerging genres from 339 randomly selected web pages from World Wide Web by considering balanced 7-genre corpus for single label and unbalanced 20-genre corpus for multi-label respectively. The performance of the resultant datasets (10-genre single label and 23-genre multi-label) obtained during the identification process is evaluated using ACC algorithm and compared with SVM classifier, random forest classifier for single label classification and binary relevance random forest classifier, binary relevance SVM classifier for multi-label classification respectively. The classification results show that ACC algorithm gave better results when compared to existing classification algorithms.