Bigram Label Regularization to Reduce Over-Segmentation on Inline Math Expression Detection

Xing Wang, Zelun Wang, Jyh-Charn S. Liu
{"title":"Bigram Label Regularization to Reduce Over-Segmentation on Inline Math Expression Detection","authors":"Xing Wang, Zelun Wang, Jyh-Charn S. Liu","doi":"10.1109/ICDAR.2019.00069","DOIUrl":null,"url":null,"abstract":"Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the unrestricted usage of font styles and blurred boundaries with plaintext in scientific publications. For instance, many inline MEs detected by existing algorithms are split into multiple parts incorrectly due to the misidentification of a few characters. In this paper, we propose a bigram regularization model to resolve the split problem in inline ME detection. The model incorporates neighboring constraints during labeling of ME vs. plaintext. Experimental results show that this technique significantly reduces the splits of inline MEs, with small gains in the false and miss rate. In comparison with a CRF model, our model achieves a higher F1 score and a lower miss rate.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the unrestricted usage of font styles and blurred boundaries with plaintext in scientific publications. For instance, many inline MEs detected by existing algorithms are split into multiple parts incorrectly due to the misidentification of a few characters. In this paper, we propose a bigram regularization model to resolve the split problem in inline ME detection. The model incorporates neighboring constraints during labeling of ME vs. plaintext. Experimental results show that this technique significantly reduces the splits of inline MEs, with small gains in the false and miss rate. In comparison with a CRF model, our model achieves a higher F1 score and a lower miss rate.
双图标签正则化减少内联数学表达式检测的过度分割
内联数学表达式(Inline Mathematical Expression,简称ME)是指将数学表达式混合到科技论文的明文语句中。由于科学出版物中字体样式的无限制使用和明文的模糊边界,检测内联MEs是一个非常重要的问题。例如,现有算法检测到的许多内联MEs由于对几个字符的错误识别而被错误地分成多个部分。本文提出了一种双图正则化模型来解决内联ME检测中的分割问题。该模型在标记ME与明文时结合了相邻约束。实验结果表明,该技术显著降低了内联MEs的分裂,在误报率和漏报率上有较小的提高。与CRF模型相比,我们的模型获得了更高的F1分数和更低的缺失率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信