基于url的WWW结构与动态分析

Jeffery Kline, Edward Oakes, P. Barford
{"title":"基于url的WWW结构与动态分析","authors":"Jeffery Kline, Edward Oakes, P. Barford","doi":"10.23919/TMA.2019.8784665","DOIUrl":null,"url":null,"abstract":"Understanding the evolving characteristics of the World Wide Web is challenging due to its immense size and diversity. In this paper, we investigate Web structure and dynamics by analyzing over 1 trillion URLs requested during Web browsing by a 2 million person user panel over a period of 12 months. We begin by examining the lifetime of URLs and find that in contrast to early studies, the set of URLs visited is highly dynamic and well-modeled by a gamma distribution. Next, we analyze URL-traversal patterns and find that browsing behaviors differ substantially from hyperlink connectivity. One consequence of this is that the structure of the Web that is derived from hyperlink connectivity does not extend directly to actual user behavior. Finally, we consider the commonly used path and query portions of URLs and highlight their characteristics when used by different website genres. These semantic differences suggest that URL structure can broadly classify the kind of resource that a URL references. Our analyses lead to a set of proposed enhancements to the URL standard that would improve Web manageability and transparency and make a step toward the semantic web.","PeriodicalId":241672,"journal":{"name":"2019 Network Traffic Measurement and Analysis Conference (TMA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A URL-based Analysis of WWW Structure and Dynamics\",\"authors\":\"Jeffery Kline, Edward Oakes, P. Barford\",\"doi\":\"10.23919/TMA.2019.8784665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the evolving characteristics of the World Wide Web is challenging due to its immense size and diversity. In this paper, we investigate Web structure and dynamics by analyzing over 1 trillion URLs requested during Web browsing by a 2 million person user panel over a period of 12 months. We begin by examining the lifetime of URLs and find that in contrast to early studies, the set of URLs visited is highly dynamic and well-modeled by a gamma distribution. Next, we analyze URL-traversal patterns and find that browsing behaviors differ substantially from hyperlink connectivity. One consequence of this is that the structure of the Web that is derived from hyperlink connectivity does not extend directly to actual user behavior. Finally, we consider the commonly used path and query portions of URLs and highlight their characteristics when used by different website genres. These semantic differences suggest that URL structure can broadly classify the kind of resource that a URL references. Our analyses lead to a set of proposed enhancements to the URL standard that would improve Web manageability and transparency and make a step toward the semantic web.\",\"PeriodicalId\":241672,\"journal\":{\"name\":\"2019 Network Traffic Measurement and Analysis Conference (TMA)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Network Traffic Measurement and Analysis Conference (TMA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/TMA.2019.8784665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Network Traffic Measurement and Analysis Conference (TMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/TMA.2019.8784665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

由于其巨大的规模和多样性,理解万维网的发展特征是具有挑战性的。在本文中,我们通过分析一个200万人的用户小组在12个月的时间里浏览Web期间所请求的超过1万亿个url来研究Web结构和动态。我们首先检查url的生命周期,并发现与早期的研究相反,访问的url集是高度动态的,并且通过gamma分布进行了良好的建模。接下来,我们分析了url遍历模式,发现浏览行为与超链接连接有很大的不同。这样做的一个后果是,源自超链接连接的Web结构不能直接扩展到实际的用户行为。最后,我们考虑了url中常用的路径和查询部分,并强调了它们在不同网站类型使用时的特征。这些语义差异表明,URL结构可以对URL引用的资源类型进行广泛的分类。我们的分析为URL标准提出了一系列增强建议,这些建议将提高Web的可管理性和透明度,并向语义Web迈进一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A URL-based Analysis of WWW Structure and Dynamics
Understanding the evolving characteristics of the World Wide Web is challenging due to its immense size and diversity. In this paper, we investigate Web structure and dynamics by analyzing over 1 trillion URLs requested during Web browsing by a 2 million person user panel over a period of 12 months. We begin by examining the lifetime of URLs and find that in contrast to early studies, the set of URLs visited is highly dynamic and well-modeled by a gamma distribution. Next, we analyze URL-traversal patterns and find that browsing behaviors differ substantially from hyperlink connectivity. One consequence of this is that the structure of the Web that is derived from hyperlink connectivity does not extend directly to actual user behavior. Finally, we consider the commonly used path and query portions of URLs and highlight their characteristics when used by different website genres. These semantic differences suggest that URL structure can broadly classify the kind of resource that a URL references. Our analyses lead to a set of proposed enhancements to the URL standard that would improve Web manageability and transparency and make a step toward the semantic web.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信