自动检测和分割目录页从文档图像

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. Pub Date : 2003-08-03 DOI:10.1109/ICDAR.2003.1227697

Sekhar Mandal, S. Chowdhury, A. Das, B. Chanda

{"title":"自动检测和分割目录页从文档图像","authors":"Sekhar Mandal, S. Chowdhury, A. Das, B. Chanda","doi":"10.1109/ICDAR.2003.1227697","DOIUrl":null,"url":null,"abstract":"With an aim to extract the structural information from the table of contents (TOC) to help develop a digital document library, the requirement of identifying/segmenting the TOC page is obvious. The objective to create a digital document library is to provide a non-labour intensive, cheap and flexible way of storing, representing and managing the paper document in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Information from the TOC pages is to be extracted for use in a document database for effective retrieval of the required pages. We present a fully automatic identification and segmentation of a table of contents (TOC) page from a scanned document.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Automated detection and segmentation of table of contents page from document images\",\"authors\":\"Sekhar Mandal, S. Chowdhury, A. Das, B. Chanda\",\"doi\":\"10.1109/ICDAR.2003.1227697\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With an aim to extract the structural information from the table of contents (TOC) to help develop a digital document library, the requirement of identifying/segmenting the TOC page is obvious. The objective to create a digital document library is to provide a non-labour intensive, cheap and flexible way of storing, representing and managing the paper document in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Information from the TOC pages is to be extracted for use in a document database for effective retrieval of the required pages. We present a fully automatic identification and segmentation of a table of contents (TOC) page from a scanned document.\",\"PeriodicalId\":249193,\"journal\":{\"name\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2003.1227697\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2003.1227697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

摘要

为了从目录(TOC)中提取结构信息，以帮助开发数字文档库，对TOC页面进行识别/分割的需求是显而易见的。建立数码文件图书馆的目的，是提供一种不需要耗费大量人力、廉价和灵活的方式，以电子形式储存、表示和管理纸质文件，方便索引、浏览、打印和提取所需的部分。从TOC页面中提取的信息将用于文档数据库中，以便有效地检索所需的页面。我们提出了一个完全自动识别和分割目录(TOC)页从扫描文档。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automated detection and segmentation of table of contents page from document images

With an aim to extract the structural information from the table of contents (TOC) to help develop a digital document library, the requirement of identifying/segmenting the TOC page is obvious. The objective to create a digital document library is to provide a non-labour intensive, cheap and flexible way of storing, representing and managing the paper document in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Information from the TOC pages is to be extracted for use in a document database for effective retrieval of the required pages. We present a fully automatic identification and segmentation of a table of contents (TOC) page from a scanned document.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.

自引率

0.00%

发文量