Algorithm for building a website model

IF 0.1 Q4 INTERNATIONAL RELATIONS
Natalia A. Huk, Stanislav V. Dykhanov, Oleh D. Matiushchenko
{"title":"Algorithm for building a website model","authors":"Natalia A. Huk, Stanislav V. Dykhanov, Oleh D. Matiushchenko","doi":"10.26565/2304-6201-2020-47-03","DOIUrl":null,"url":null,"abstract":"The analysis of the structure of the website modeling has been carried out. The models of Internet space representation in the form of semantic networks, frame structures and ontology have been analyzed. The web graph model has been chosen to represent the web resource. The pages of a web resource are connected by hyperlinks, which form the internal structure of the resource. To build a model of a website in the form of a web graph, a method and algorithm for scanning the pages of a web resource have been developed. The web resource scanning is performed by in depth searching with the LIFO (Last In - First Out) method. Links are searched by sorting the lines of the page markup text and extracting links by using regular expressions. Only links to pages within the resource are taken into account in the search process, external links are ignored. The crawling procedure is implemented by using the Scrapy framework and the Python. To account for the presence of additional filters used to select pages with criteria, the rules for selecting URL in HTML code have been strengthened. Web resources are scanned to build their web graphs. Storing information by using a list of edges and an adjacency matrix is used in further work with the obtained web graphs. To visualize the obtained graphs and calculate some metric characteristics, the Gephi software environment and the algorithm for stacking the vertices of the Yifan Hu graph has been used. The graph diameters, the average vertex degree, the average path length, the density factor of the graph are used for analysis of the structural connectivity of the graphs studied. The proposed approach can be applied during the site reengineering procedure.","PeriodicalId":53765,"journal":{"name":"Meridiano 47-Journal of Global Studies","volume":"15 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2020-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meridiano 47-Journal of Global Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26565/2304-6201-2020-47-03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INTERNATIONAL RELATIONS","Score":null,"Total":0}
引用次数: 1

Abstract

The analysis of the structure of the website modeling has been carried out. The models of Internet space representation in the form of semantic networks, frame structures and ontology have been analyzed. The web graph model has been chosen to represent the web resource. The pages of a web resource are connected by hyperlinks, which form the internal structure of the resource. To build a model of a website in the form of a web graph, a method and algorithm for scanning the pages of a web resource have been developed. The web resource scanning is performed by in depth searching with the LIFO (Last In - First Out) method. Links are searched by sorting the lines of the page markup text and extracting links by using regular expressions. Only links to pages within the resource are taken into account in the search process, external links are ignored. The crawling procedure is implemented by using the Scrapy framework and the Python. To account for the presence of additional filters used to select pages with criteria, the rules for selecting URL in HTML code have been strengthened. Web resources are scanned to build their web graphs. Storing information by using a list of edges and an adjacency matrix is used in further work with the obtained web graphs. To visualize the obtained graphs and calculate some metric characteristics, the Gephi software environment and the algorithm for stacking the vertices of the Yifan Hu graph has been used. The graph diameters, the average vertex degree, the average path length, the density factor of the graph are used for analysis of the structural connectivity of the graphs studied. The proposed approach can be applied during the site reengineering procedure.
建立网站模型的算法
对网站建模的结构进行了分析。分析了语义网络、框架结构和本体形式的互联网空间表示模型。选择了web图模型来表示web资源。web资源的页面通过超链接连接起来,形成资源的内部结构。为了以网络图形的形式建立一个网站模型,本文提出了一种扫描网页资源的方法和算法。网络资源扫描采用后进先出(LIFO)方法进行深度搜索。通过对页面标记文本的行进行排序并使用正则表达式提取链接来搜索链接。在搜索过程中,只考虑到资源内页面的链接,而忽略外部链接。爬行过程通过使用Scrapy框架和Python实现。为了考虑到用于选择带有标准的页面的其他过滤器的存在,在HTML代码中选择URL的规则得到了加强。扫描Web资源以构建其Web图形。通过使用边列表和邻接矩阵来存储信息用于对获得的web图的进一步处理。为了将得到的图形可视化并计算一些度量特征,使用了Gephi软件环境和一帆湖图顶点叠加算法。利用图的直径、平均顶点度、平均路径长度和密度因子来分析所研究图的结构连通性。该方法可应用于现场再造过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Meridiano 47-Journal of Global Studies
Meridiano 47-Journal of Global Studies INTERNATIONAL RELATIONS-
自引率
0.00%
发文量
19
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信