{"title":"FACT - Fine grained Assessment of web page CredibiliTy","authors":"Shriyansh Agrawal, L. Sanagavarapu, Y. Reddy","doi":"10.1109/TENCON.2019.8929515","DOIUrl":null,"url":null,"abstract":"With more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ($GCS$). We validated our approach on ‘Information Security’ dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on ‘Health’ domain web pages and had comparable results. The calculated $GCS$ correlated 69% with crowdsourced Web Of Trust ($WOT$) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our $GCS$ approach aligns with human way ($WOT$) as compared to algorithmic way (Alexa) of web assessment in both the experiments.","PeriodicalId":36690,"journal":{"name":"Platonic Investigations","volume":"56 1","pages":"1088-1097"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Platonic Investigations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2019.8929515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 1
Abstract
With more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ($GCS$). We validated our approach on ‘Information Security’ dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on ‘Health’ domain web pages and had comparable results. The calculated $GCS$ correlated 69% with crowdsourced Web Of Trust ($WOT$) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our $GCS$ approach aligns with human way ($WOT$) as compared to algorithmic way (Alexa) of web assessment in both the experiments.