{"title":"DGNet:基于可变形卷积和全局上下文关注的手写数学公式识别网络","authors":"Cuihong Wen, Lemin Yin, Shuai Liu","doi":"10.1007/s11036-024-02315-x","DOIUrl":null,"url":null,"abstract":"<p>The Handwritten Mathematical Expression Recognition (HMER) task aims to generate corresponding LATEX sequences from images of handwritten mathematical expressions. Currently, the encoder-decoder architecture has made significant progress in this task. However, the architecture based on the DenseNet encoder fails to adequately consider the unique features of handwritten mathematical expressions (HME) and the similarity between different characters. Additionally, the decoder, with its small receptive field during the decoding process, fails to effectively capture the spatial positional information of the targets, resulting in a lack of global contextual information during decoding. To address these issues, this paper proposes a neural network called DGNet based on deformable convolution and global contextual attention. Our network takes into full consideration the sparse nature of handwritten mathematical formulas and utilizes the properties of deformable convolution, allowing the convolution kernel to deform based on the content of the neighborhood. This enables our model to better adapt to geometric changes and other deformations in handwritten mathematical expressions. Simultaneously, we introduce GCAttention in optimizing the feature part to fully aggregate global contextual features of both position and channel. In experiments, our model achieved accuracies of 58.51%, 56.32%, and 56.1% on the CROHME 2014, 2016, and 2019 datasets, respectively.</p>","PeriodicalId":501103,"journal":{"name":"Mobile Networks and Applications","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DGNet: A Handwritten Mathematical Formula Recognition Network Based on Deformable Convolution and Global Context Attention\",\"authors\":\"Cuihong Wen, Lemin Yin, Shuai Liu\",\"doi\":\"10.1007/s11036-024-02315-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The Handwritten Mathematical Expression Recognition (HMER) task aims to generate corresponding LATEX sequences from images of handwritten mathematical expressions. Currently, the encoder-decoder architecture has made significant progress in this task. However, the architecture based on the DenseNet encoder fails to adequately consider the unique features of handwritten mathematical expressions (HME) and the similarity between different characters. Additionally, the decoder, with its small receptive field during the decoding process, fails to effectively capture the spatial positional information of the targets, resulting in a lack of global contextual information during decoding. To address these issues, this paper proposes a neural network called DGNet based on deformable convolution and global contextual attention. Our network takes into full consideration the sparse nature of handwritten mathematical formulas and utilizes the properties of deformable convolution, allowing the convolution kernel to deform based on the content of the neighborhood. This enables our model to better adapt to geometric changes and other deformations in handwritten mathematical expressions. Simultaneously, we introduce GCAttention in optimizing the feature part to fully aggregate global contextual features of both position and channel. In experiments, our model achieved accuracies of 58.51%, 56.32%, and 56.1% on the CROHME 2014, 2016, and 2019 datasets, respectively.</p>\",\"PeriodicalId\":501103,\"journal\":{\"name\":\"Mobile Networks and Applications\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mobile Networks and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11036-024-02315-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mobile Networks and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11036-024-02315-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DGNet: A Handwritten Mathematical Formula Recognition Network Based on Deformable Convolution and Global Context Attention
The Handwritten Mathematical Expression Recognition (HMER) task aims to generate corresponding LATEX sequences from images of handwritten mathematical expressions. Currently, the encoder-decoder architecture has made significant progress in this task. However, the architecture based on the DenseNet encoder fails to adequately consider the unique features of handwritten mathematical expressions (HME) and the similarity between different characters. Additionally, the decoder, with its small receptive field during the decoding process, fails to effectively capture the spatial positional information of the targets, resulting in a lack of global contextual information during decoding. To address these issues, this paper proposes a neural network called DGNet based on deformable convolution and global contextual attention. Our network takes into full consideration the sparse nature of handwritten mathematical formulas and utilizes the properties of deformable convolution, allowing the convolution kernel to deform based on the content of the neighborhood. This enables our model to better adapt to geometric changes and other deformations in handwritten mathematical expressions. Simultaneously, we introduce GCAttention in optimizing the feature part to fully aggregate global contextual features of both position and channel. In experiments, our model achieved accuracies of 58.51%, 56.32%, and 56.1% on the CROHME 2014, 2016, and 2019 datasets, respectively.