{"title":"基于URL和内容的网站分类:阿尔及利亚vs.非阿尔及利亚案例","authors":"Abdessamed Ouessai, Elberrichi Zakaria","doi":"10.1109/ISPS.2015.7244974","DOIUrl":null,"url":null,"abstract":"Web page classification based on topic or sentiments is a common application of web content mining techniques. In this paper we will present a novel application intended to identify the nation targeted by a specific web page. The aim is to be able to automatically distinguish websites targeting a specific nation, using both the URL and the content of a web page. In this paper we will address the issue of identifying Algerian-interest web pages using a machine learning approach. We will present the process of acquiring data for the supervised learning phase and adapting it into a usable dataset, as well as using it to construct three distinct classifiers using different parts of the data. The resulting classifiers have shown outstanding performances (up to F-score = 0.93) for such application.","PeriodicalId":165465,"journal":{"name":"2015 12th International Symposium on Programming and Systems (ISPS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Web site classification based on URL and content: Algerian vs. non-Algerian case\",\"authors\":\"Abdessamed Ouessai, Elberrichi Zakaria\",\"doi\":\"10.1109/ISPS.2015.7244974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web page classification based on topic or sentiments is a common application of web content mining techniques. In this paper we will present a novel application intended to identify the nation targeted by a specific web page. The aim is to be able to automatically distinguish websites targeting a specific nation, using both the URL and the content of a web page. In this paper we will address the issue of identifying Algerian-interest web pages using a machine learning approach. We will present the process of acquiring data for the supervised learning phase and adapting it into a usable dataset, as well as using it to construct three distinct classifiers using different parts of the data. The resulting classifiers have shown outstanding performances (up to F-score = 0.93) for such application.\",\"PeriodicalId\":165465,\"journal\":{\"name\":\"2015 12th International Symposium on Programming and Systems (ISPS)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 12th International Symposium on Programming and Systems (ISPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPS.2015.7244974\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 12th International Symposium on Programming and Systems (ISPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPS.2015.7244974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Web site classification based on URL and content: Algerian vs. non-Algerian case
Web page classification based on topic or sentiments is a common application of web content mining techniques. In this paper we will present a novel application intended to identify the nation targeted by a specific web page. The aim is to be able to automatically distinguish websites targeting a specific nation, using both the URL and the content of a web page. In this paper we will address the issue of identifying Algerian-interest web pages using a machine learning approach. We will present the process of acquiring data for the supervised learning phase and adapting it into a usable dataset, as well as using it to construct three distinct classifiers using different parts of the data. The resulting classifiers have shown outstanding performances (up to F-score = 0.93) for such application.