Newspaper Article Extraction Using Hierarchical Fixed Point Model

2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI:10.1109/DAS.2014.42

Anukriti Bansal, S. Chaudhury, Sumantra Dutta Roy, J. B. Srivastava

引用次数: 14

Abstract

This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.

查看原文本刊更多论文

基于层次不动点模型的报纸文章提取

本文提出了一种新的基于学习的框架，利用不动点模型从报纸图像中提取文章。系统的输入包括使用标准图像处理技术获得的文本和图形块。不动点模型利用上下文信息和每个块的特征来学习报纸图像的布局，并获得一个收缩映射，为每个块分配一个唯一的标签。我们使用的分层模型分为两个阶段。在第一阶段，为每个分段块分配一个语义标签(标题、子标题、文本块、图像和标题)。然后将标签用作下一阶段的输入，将相关块分组到新闻文章中。实验结果表明，该算法适用于报纸标注和文章提取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 11th IAPR International Workshop on Document Analysis Systems

自引率

0.00%

发文量