Classification of handwritten annotations in mixed-media documents

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI:10.1109/CRV55824.2022.00027

Amanda Dash, A. Albu

{"title":"Classification of handwritten annotations in mixed-media documents","authors":"Amanda Dash, A. Albu","doi":"10.1109/CRV55824.2022.00027","DOIUrl":null,"url":null,"abstract":"Handwritten annotations in documents contain valuable information, but they are challenging to detect and identify. This paper addresses this challenge. We propose an al-gorithm for generating a novel mixed-media document dataset, Annotated Docset, that consists of 14 classes of machine-printed and handwritten elements and annotations. We also propose a novel loss function, Dense Loss, which can correctly identify small objects in complex documents when used in fully convolutional networks (e.g. U-NET, DeepLabV3+). Our Dense Loss function is a compound function that uses local region homogeneity to promote contiguous and smooth segmentation predictions while also using an L1-norm loss to reconstruct the dense-labelled ground truth. By using regression instead of a probabilistic approach to pixel classification, we avoid the pitfalls of training on datasets with small or underrepre-sented objects. We show that our loss function outperforms other semantic segmentation loss functions for imbalanced datasets, containing few elements that occupy small areas. Experimental results show that the proposed method achieved a mean Intersection-over-Union (mIoU) score of 0.7163 for all document classes and 0.6290 for handwritten annotations, thus outperforming state-of-the-art loss functions.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th Conference on Robots and Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV55824.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Handwritten annotations in documents contain valuable information, but they are challenging to detect and identify. This paper addresses this challenge. We propose an al-gorithm for generating a novel mixed-media document dataset, Annotated Docset, that consists of 14 classes of machine-printed and handwritten elements and annotations. We also propose a novel loss function, Dense Loss, which can correctly identify small objects in complex documents when used in fully convolutional networks (e.g. U-NET, DeepLabV3+). Our Dense Loss function is a compound function that uses local region homogeneity to promote contiguous and smooth segmentation predictions while also using an L1-norm loss to reconstruct the dense-labelled ground truth. By using regression instead of a probabilistic approach to pixel classification, we avoid the pitfalls of training on datasets with small or underrepre-sented objects. We show that our loss function outperforms other semantic segmentation loss functions for imbalanced datasets, containing few elements that occupy small areas. Experimental results show that the proposed method achieved a mean Intersection-over-Union (mIoU) score of 0.7163 for all document classes and 0.6290 for handwritten annotations, thus outperforming state-of-the-art loss functions.

查看原文本刊更多论文

混合媒体文档中手写注释的分类

文档中的手写注释包含有价值的信息，但它们很难检测和识别。本文解决了这一挑战。我们提出了一种算法来生成一种新的混合媒体文档数据集，Annotated Docset，它由14类机器打印和手写的元素和注释组成。我们还提出了一种新的损失函数，Dense loss，当在全卷积网络(例如U-NET, DeepLabV3+)中使用时，它可以正确识别复杂文档中的小对象。我们的Dense Loss函数是一个复合函数，它使用局部区域同质性来促进连续和平滑的分割预测，同时也使用l1范数损失来重建密集标记的地面真值。通过使用回归而不是概率方法来进行像素分类，我们避免了在具有小或未充分表示对象的数据集上进行训练的陷阱。我们表明，对于不平衡数据集，我们的损失函数优于其他语义分割损失函数，这些数据集包含很少的元素，占用很小的区域。实验结果表明，该方法对所有文档类的平均mIoU分数为0.7163，对手写注释的平均mIoU分数为0.6290，优于最先进的损失函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 19th Conference on Robots and Vision (CRV)

自引率

0.00%

发文量