{"title":"基于网页特性的特定语言抓取","authors":"Masomeh Azimzadeh, Alireza Yari, M. Kargar","doi":"10.1109/MCIT.2010.5444865","DOIUrl":null,"url":null,"abstract":"Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented for Persian language and evaluated in Iranian web domain. The evaluation results show how this approach can improve the performance of crawling from speed and coverage points of view.","PeriodicalId":285648,"journal":{"name":"2010 International Conference on Multimedia Computing and Information Technology (MCIT)","volume":" 30","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Language specific crawling based on web pages features\",\"authors\":\"Masomeh Azimzadeh, Alireza Yari, M. Kargar\",\"doi\":\"10.1109/MCIT.2010.5444865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented for Persian language and evaluated in Iranian web domain. The evaluation results show how this approach can improve the performance of crawling from speed and coverage points of view.\",\"PeriodicalId\":285648,\"journal\":{\"name\":\"2010 International Conference on Multimedia Computing and Information Technology (MCIT)\",\"volume\":\" 30\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 International Conference on Multimedia Computing and Information Technology (MCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCIT.2010.5444865\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Multimedia Computing and Information Technology (MCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCIT.2010.5444865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Language specific crawling based on web pages features
Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented for Persian language and evaluated in Iranian web domain. The evaluation results show how this approach can improve the performance of crawling from speed and coverage points of view.