Contrastive representation enhancement and learning for handwritten mathematical expression recognition

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2024-08-30 DOI:10.1016/j.patrec.2024.08.021

Zihao Lin , Jinrong Li , Gang Dai , Tianshui Chen , Shuangping Huang , Jianmin Lin

{"title":"Contrastive representation enhancement and learning for handwritten mathematical expression recognition","authors":"Zihao Lin , Jinrong Li , Gang Dai , Tianshui Chen , Shuangping Huang , Jianmin Lin","doi":"10.1016/j.patrec.2024.08.021","DOIUrl":null,"url":null,"abstract":"<div><p>Handwritten mathematical expression recognition (HMER) is an appealing task due to its wide applications and research challenges. Previous deep learning-based methods used string decoder to emphasize on expression symbol awareness and achieved considerable recognition performance. However, these methods still meet an obstacle in recognizing handwritten symbols with varying appearance, in which huge appearance variations significantly lead to the ambiguity of symbol representation. To this end, our intuition is to employ printed expressions with unified appearance to serve as the template of handwritten expressions, alleviating the effects brought by varying symbol appearance. In this paper, we propose a contrastive learning method, where handwritten symbols with identical semantic are clustered together through the guidance of printed symbols, leading model to enhance the robustness of symbol semantic representations. Specifically, we propose an anchor generation scheme to obtain printed expression images corresponding with handwritten expressions. We propose a contrastive learning objective, termed Semantic-NCE Loss, to pull together printed and handwritten symbols with identical semantic. Moreover, we employ a string decoder to parse the calibrated semantic representations, outputting satisfactory expression symbols. The experiment results on benchmark datasets CROHME 14/16/19 demonstrate that our method noticeably improves recognition accuracy of handwritten expressions and outperforms the standard string decoder methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 14-20"},"PeriodicalIF":3.9000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002538","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Handwritten mathematical expression recognition (HMER) is an appealing task due to its wide applications and research challenges. Previous deep learning-based methods used string decoder to emphasize on expression symbol awareness and achieved considerable recognition performance. However, these methods still meet an obstacle in recognizing handwritten symbols with varying appearance, in which huge appearance variations significantly lead to the ambiguity of symbol representation. To this end, our intuition is to employ printed expressions with unified appearance to serve as the template of handwritten expressions, alleviating the effects brought by varying symbol appearance. In this paper, we propose a contrastive learning method, where handwritten symbols with identical semantic are clustered together through the guidance of printed symbols, leading model to enhance the robustness of symbol semantic representations. Specifically, we propose an anchor generation scheme to obtain printed expression images corresponding with handwritten expressions. We propose a contrastive learning objective, termed Semantic-NCE Loss, to pull together printed and handwritten symbols with identical semantic. Moreover, we employ a string decoder to parse the calibrated semantic representations, outputting satisfactory expression symbols. The experiment results on benchmark datasets CROHME 14/16/19 demonstrate that our method noticeably improves recognition accuracy of handwritten expressions and outperforms the standard string decoder methods.

查看原文本刊更多论文

手写数学表达式识别的对比表示增强和学习

手写数学表达式识别（HMER）因其广泛的应用和研究挑战而成为一项极具吸引力的任务。以往基于深度学习的方法使用字符串解码器来强调表达符号感知，并取得了可观的识别性能。然而，这些方法在识别具有不同外观的手写符号时仍会遇到障碍，其中巨大的外观变化会显著导致符号表示的模糊性。为此，我们的直觉是采用具有统一外观的印刷表达作为手写表达的模板，以减轻符号外观变化带来的影响。在本文中，我们提出了一种对比学习方法，即通过印刷符号的引导，将语义相同的手写符号聚类在一起，从而引导模型增强符号语义表征的鲁棒性。具体来说，我们提出了一种锚生成方案，以获得与手写表情相对应的印刷表情图像。我们提出了一种对比学习目标（称为语义-NCE损失），将具有相同语义的印刷符号和手写符号放在一起。此外，我们还采用了字符串解码器来解析校准后的语义表示，从而输出令人满意的表情符号。在基准数据集 CROHME 14/16/19 上的实验结果表明，我们的方法明显提高了手写表情的识别准确率，并优于标准字符串解码器方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.