SPaSe - Multi-Label Page Segmentation for Presentation Slides

2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI:10.1109/WACV.2019.00082

Monica Haurilet, Ziad Al-Halah, R. Stiefelhagen

{"title":"SPaSe - Multi-Label Page Segmentation for Presentation Slides","authors":"Monica Haurilet, Ziad Al-Halah, R. Stiefelhagen","doi":"10.1109/WACV.2019.00082","DOIUrl":null,"url":null,"abstract":"We introduce the first benchmark dataset for slide-page segmentation. Presentation slides are one of the most prominent document types used to exchange ideas across the web, educational institutes and businesses. This document format is marked with a complex layout which contains a rich variety of graphical (e.g. diagram, logo), textual (e.g. heading, affiliation) and structural components (e.g. enumeration, legend). This vast and popular knowledge source is still unattainable by modern machine learning technique due to lack of annotated data. To tackle this issue, we introduce SPaSe (Slide Page Segmentation), a novel dataset containing in total 2000 slides with dense, pixel-wise annotations of 25 classes. We show that slide segmentation reveals some interesting properties that characterize this task. Unlike the common image segmentation problem, disjoint classes tend to have a high overlap of regions, thus posing this segmentation task as a multi-label problem. Furthermore, many of the frequently encountered classes in slides are location sensitive (e.g. title, footnote). Hence, we believe our dataset represents a challenging and interesting benchmark for novel segmentation models. Finally, we evaluate state-of-the-art deep segmentation models on our dataset and show that it is suitable for developing deep learning models without any need of pre-training. Our dataset will be released to the public to foster further research on this interesting task.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"20 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2019.00082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

We introduce the first benchmark dataset for slide-page segmentation. Presentation slides are one of the most prominent document types used to exchange ideas across the web, educational institutes and businesses. This document format is marked with a complex layout which contains a rich variety of graphical (e.g. diagram, logo), textual (e.g. heading, affiliation) and structural components (e.g. enumeration, legend). This vast and popular knowledge source is still unattainable by modern machine learning technique due to lack of annotated data. To tackle this issue, we introduce SPaSe (Slide Page Segmentation), a novel dataset containing in total 2000 slides with dense, pixel-wise annotations of 25 classes. We show that slide segmentation reveals some interesting properties that characterize this task. Unlike the common image segmentation problem, disjoint classes tend to have a high overlap of regions, thus posing this segmentation task as a multi-label problem. Furthermore, many of the frequently encountered classes in slides are location sensitive (e.g. title, footnote). Hence, we believe our dataset represents a challenging and interesting benchmark for novel segmentation models. Finally, we evaluate state-of-the-art deep segmentation models on our dataset and show that it is suitable for developing deep learning models without any need of pre-training. Our dataset will be released to the public to foster further research on this interesting task.

查看原文本刊更多论文

用于演示幻灯片的多标签页面分割

我们介绍了第一个用于幻灯片页面分割的基准数据集。演示幻灯片是用于在网络、教育机构和企业之间交换思想的最重要的文档类型之一。该文档格式具有复杂的布局，其中包含丰富的图形(例如图表、徽标)、文本(例如标题、从属关系)和结构组件(例如枚举、图例)。由于缺乏注释数据，现代机器学习技术仍然无法实现这种庞大而流行的知识来源。为了解决这个问题，我们引入了SPaSe (Slide Page Segmentation)，这是一个新的数据集，总共包含2000张幻灯片，其中包含25个类的密集的像素级注释。我们展示了幻灯片分割揭示了这个任务的一些有趣的特性。与常见的图像分割问题不同，不相交的类往往具有很高的区域重叠，从而使该分割任务成为一个多标签问题。此外，幻灯片中经常遇到的许多类都是位置敏感的(例如标题、脚注)。因此，我们相信我们的数据集代表了一个具有挑战性和有趣的新分割模型基准。最后，我们在我们的数据集上评估了最先进的深度分割模型，并表明它适合开发深度学习模型，而不需要任何预训练。我们的数据集将向公众发布，以促进对这一有趣任务的进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量