Web information extraction for content augmentation

Proceedings. IEEE International Conference on Multimedia and Expo Pub Date : 2002-11-07 DOI:10.1109/ICME.2002.1035617

A. Janevski, N. Dimitrova

引用次数: 7

Abstract

Today, users have to cope with an overwhelming number of TV channels and Web content sources. We introduce automatic content augmentation as a novel approach to contextual information extraction on behalf of the user where the context is provided by the primary content source (i.e. TV channel) and tailored by the user's preferences. A key aspect of this approach is Web information extraction (WebIE) which automatically derives structured information from unstructured Web documents. Our system executes WebIE tasks, each an instantiation of WebIE rules, our generic document processors. We present two WebIE approaches: diffusion WebIE that crawls a wide set of Web pages and extracts information from a subset of the pertinent pages; and laser WebIE that accesses a select set of Web pages and extracts narrowly defined information. We describe the architecture and the implementation details of the system and provide detailed laser WebIE examples.

查看原文本刊更多论文

用于内容增强的Web信息提取

今天，用户必须处理大量的电视频道和网络内容来源。我们引入了自动内容增强，作为代表用户提取上下文信息的一种新方法，其中上下文由主要内容源(即电视频道)提供，并根据用户的偏好进行定制。该方法的一个关键方面是Web信息提取(WebIE)，它自动从非结构化Web文档中提取结构化信息。我们的系统执行WebIE任务，每个任务都是WebIE规则的实例化，即我们的通用文档处理器。我们提出了两种WebIE方法:扩散WebIE，它抓取大量网页并从相关页面的子集中提取信息;以及激光网络浏览器，它可以访问一组选定的网页，并提取狭义定义的信息。描述了系统的体系结构和实现细节，并提供了详细的激光web应用实例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE International Conference on Multimedia and Expo

自引率

0.00%

发文量