Anukriti Bansal, S. Chaudhury, Sumantra Dutta Roy, J. B. Srivastava
{"title":"基于层次不动点模型的报纸文章提取","authors":"Anukriti Bansal, S. Chaudhury, Sumantra Dutta Roy, J. B. Srivastava","doi":"10.1109/DAS.2014.42","DOIUrl":null,"url":null,"abstract":"This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Newspaper Article Extraction Using Hierarchical Fixed Point Model\",\"authors\":\"Anukriti Bansal, S. Chaudhury, Sumantra Dutta Roy, J. B. Srivastava\",\"doi\":\"10.1109/DAS.2014.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.\",\"PeriodicalId\":220495,\"journal\":{\"name\":\"2014 11th IAPR International Workshop on Document Analysis Systems\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 11th IAPR International Workshop on Document Analysis Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2014.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th IAPR International Workshop on Document Analysis Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2014.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Newspaper Article Extraction Using Hierarchical Fixed Point Model
This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.