Computer vision-based analysis of web page structure for assistive interfaces

Proceedings of the 13th Web for All Conference Pub Date : 2016-04-11 DOI:10.1145/2899475.2899506

M. Cormier

{"title":"Computer vision-based analysis of web page structure for assistive interfaces","authors":"M. Cormier","doi":"10.1145/2899475.2899506","DOIUrl":null,"url":null,"abstract":"My PhD research aims to develop novel solutions to the challenge of identifying web page structure through the visual analysis of web pages as images. The intention is to then combine this back end design with various front end applications in order to provide improved web experiences for users with assistive needs (e.g. assisting visually impaired users by supporting more selective screenreader output, or improving experiences of users with cognitive deficits by allowing reduction of clutter or zooming in on selected web page content). I propose to build a comprehensive computer vision-based system to analyse the semantic structure of web pages based purely on an image of the rendered page, which will produce a rich representation of the page as a tree of regions labelled according to their semantic role. Most research into web page segmentation has focused on the use of the structure of the DOM tree and visual features derived from properties specified in the DOM tree. I argue, however, that the image of the rendered page may be a better representation to use, since it is created by the page designer to convey the structure of the page to the user, while the source code and DOM tree are simply intended to cause the browser's rendering engine to produce the correct appearance, and treat many types of content as black boxes. Additionally, my proposed system uses exactly the information seen by a user regardless of implementation method; this gives advantages in implementation-independence and versatility.","PeriodicalId":337838,"journal":{"name":"Proceedings of the 13th Web for All Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th Web for All Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2899475.2899506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

My PhD research aims to develop novel solutions to the challenge of identifying web page structure through the visual analysis of web pages as images. The intention is to then combine this back end design with various front end applications in order to provide improved web experiences for users with assistive needs (e.g. assisting visually impaired users by supporting more selective screenreader output, or improving experiences of users with cognitive deficits by allowing reduction of clutter or zooming in on selected web page content). I propose to build a comprehensive computer vision-based system to analyse the semantic structure of web pages based purely on an image of the rendered page, which will produce a rich representation of the page as a tree of regions labelled according to their semantic role. Most research into web page segmentation has focused on the use of the structure of the DOM tree and visual features derived from properties specified in the DOM tree. I argue, however, that the image of the rendered page may be a better representation to use, since it is created by the page designer to convey the structure of the page to the user, while the source code and DOM tree are simply intended to cause the browser's rendering engine to produce the correct appearance, and treat many types of content as black boxes. Additionally, my proposed system uses exactly the information seen by a user regardless of implementation method; this gives advantages in implementation-independence and versatility.

查看原文本刊更多论文

基于计算机视觉的辅助界面网页结构分析

我的博士研究旨在开发新的解决方案，通过对网页图像的视觉分析来识别网页结构的挑战。其目的是将后端设计与各种前端应用程序结合起来，为有辅助需求的用户提供更好的网络体验(例如，通过支持更多选择性屏幕阅读器输出来帮助视障用户，或者通过允许减少混乱或放大选定的网页内容来改善认知缺陷用户的体验)。我建议建立一个全面的基于计算机视觉的系统，纯粹基于渲染页面的图像来分析网页的语义结构，这将产生一个丰富的页面表示，作为根据其语义角色标记的区域树。大多数关于网页分割的研究都集中在使用DOM树的结构和从DOM树中指定的属性派生的视觉特征上。但是，我认为所呈现页面的图像可能是一种更好的表示，因为它是由页面设计人员创建的，目的是向用户传达页面的结构，而源代码和DOM树只是为了使浏览器的呈现引擎产生正确的外观，并将许多类型的内容视为黑盒。此外，我提出的系统使用用户看到的信息，而不管实现方法如何;这在实现独立性和多功能性方面具有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 13th Web for All Conference

自引率

0.00%

发文量