FACT - Fine grained Assessment of web page CredibiliTy

Q2 Arts and Humanities

Platonic Investigations Pub Date : 2019-10-01 DOI:10.1109/TENCON.2019.8929515

Shriyansh Agrawal, L. Sanagavarapu, Y. Reddy

{"title":"FACT - Fine grained Assessment of web page CredibiliTy","authors":"Shriyansh Agrawal, L. Sanagavarapu, Y. Reddy","doi":"10.1109/TENCON.2019.8929515","DOIUrl":null,"url":null,"abstract":"With more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ($GCS$). We validated our approach on ‘Information Security’ dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on ‘Health’ domain web pages and had comparable results. The calculated $GCS$ correlated 69% with crowdsourced Web Of Trust ($WOT$) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our $GCS$ approach aligns with human way ($WOT$) as compared to algorithmic way (Alexa) of web assessment in both the experiments.","PeriodicalId":36690,"journal":{"name":"Platonic Investigations","volume":"56 1","pages":"1088-1097"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Platonic Investigations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2019.8929515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 1

Abstract

With more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ($GCS$). We validated our approach on ‘Information Security’ dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on ‘Health’ domain web pages and had comparable results. The calculated $GCS$ correlated 69% with crowdsourced Web Of Trust ($WOT$) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our $GCS$ approach aligns with human way ($WOT$) as compared to algorithmic way (Alexa) of web assessment in both the experiments.

查看原文本刊更多论文

事实-网页可信度的细粒度评估

有超过一万亿的网页，有大量的内容可供消费。搜索引擎的查询总是会导致大量的信息，其中一部分相关，另一部分无关。通常，所提供的信息可能相互矛盾、模棱两可和不一致，从而导致内容的可信度下降。在过去，研究者们提出了一些可信度评估方法，并列举了影响网页可信度的因素。在这项工作中，我们详细介绍了一个WEBCred框架，用于网页的自动类型感知可信度评估。我们基于提议的框架开发了一个工具，用于提取网页特征实例，并在评估其类型可信度评分($GCS$)时识别网页所属的类型。我们在“信息安全”数据集上验证了我们的方法，该数据集包含7个类型的8,550个url和171个功能。有监督学习算法Gradient boosting Decision Tree对类型进行分类，测试准确率为88.75%，超过10倍交叉验证，比目前的基准有所提高。我们还检查了我们在“健康”域名网页上的方法，并得出了类似的结果。计算出的$GCS$与众包Web Of Trust ($WOT$)评分的相关性为69%，与基于Alexa算法的5个信息安全组排名的相关性为13%。这种相关性的差异表明，在两个实验中，我们的$GCS$方法与人类的方法($WOT$)相一致，而不是算法的方法(Alexa)的网络评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Platonic Investigations Arts and Humanities-Philosophy

CiteScore

0.30

自引率

0.00%

发文量