Algorithm Research for the Noise of Information Extraction Based Vision and DOM Tree

2009 International Symposium on Intelligent Ubiquitous Computing and Education Pub Date : 2009-05-15 DOI:10.1109/IUCE.2009.47

Tieli Sun, Zhiying Li, Yanji Liu, Zhenghong Liu

引用次数: 4

Abstract

Information extraction from websites is nowadays a relevant problem, usually performed by software modules called wrappers. Introduced the relevant information extraction technology. A combination of HTMLpages to extract information of the theme and extract the contents. First of all, to remove noise combination of visual block, the vision-based DOM tree denoising methods to improve the efficiency of extraction

查看原文本刊更多论文

基于视觉和DOM树的信息噪声提取算法研究

从网站中提取信息是当今的一个相关问题，通常由称为包装器的软件模块执行。介绍了相关的信息提取技术。HTMLpages的组合，用于提取主题信息和提取内容。首先，结合视觉块去除噪声，采用基于视觉的DOM树去噪方法来提高提取效率

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 International Symposium on Intelligent Ubiquitous Computing and Education

自引率

0.00%

发文量