LPGA: Line-of-Sight Parsing with Graph-Based Attention for Math Formula Recognition

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00109

Mahshad Mahdavi, M. Condon, Kenny Davila

{"title":"LPGA: Line-of-Sight Parsing with Graph-Based Attention for Math Formula Recognition","authors":"Mahshad Mahdavi, M. Condon, Kenny Davila","doi":"10.1109/ICDAR.2019.00109","DOIUrl":null,"url":null,"abstract":"We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to reduce the search space for formula structure interpretations, and to guide a classification attention model using separate channels for inputs and their local visual context. For classification, we used visual densities with Random Forests for initial development, and then converted this to a Convolutional Neural Network (CNN) with a second branch to capture context for each input image. Formula structure is extracted as a directed spanning tree from a weighted LOS graph using Edmonds' algorithm. We obtain strong results for formulas without grids or matrices in the InftyCDB-2 dataset (90.89% from components, 93.5% from symbols). Using tools from the CROHME handwritten formula recognition competitions, we were able to compile all symbol and structure recognition errors for analysis. Our data and source code are publicly available.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to reduce the search space for formula structure interpretations, and to guide a classification attention model using separate channels for inputs and their local visual context. For classification, we used visual densities with Random Forests for initial development, and then converted this to a Convolutional Neural Network (CNN) with a second branch to capture context for each input image. Formula structure is extracted as a directed spanning tree from a weighted LOS graph using Edmonds' algorithm. We obtain strong results for formulas without grids or matrices in the InftyCDB-2 dataset (90.89% from components, 93.5% from symbols). Using tools from the CROHME handwritten formula recognition competitions, we were able to compile all symbol and structure recognition errors for analysis. Our data and source code are publicly available.

查看原文本刊更多论文

LPGA:用于数学公式识别的基于图的视线解析

我们提出了一个从连接组件或符号中识别排版数学公式图像的模型。在我们的方法中，连接的组件用于构建视线(LOS)图。该图既用于减少公式结构解释的搜索空间，又用于指导使用输入及其局部视觉上下文的单独通道的分类注意模型。对于分类，我们使用随机森林的视觉密度进行初始开发，然后将其转换为具有第二个分支的卷积神经网络(CNN)，以捕获每个输入图像的上下文。利用Edmonds算法从加权LOS图中提取公式结构为有向生成树。对于InftyCDB-2数据集中没有网格或矩阵的公式，我们获得了强有力的结果(90.89%来自组件，93.5%来自符号)。使用来自CROHME手写公式识别比赛的工具，我们能够编译所有符号和结构识别错误进行分析。我们的数据和源代码是公开的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量