Minh-Thang Nguyen, Thi-Lan Le, Lan Huong Nguyen Thi, T. Nguyen
{"title":"DS-YOLOv5:用于科学文献数学公式检测的可变形和可扩展的YOLOv5","authors":"Minh-Thang Nguyen, Thi-Lan Le, Lan Huong Nguyen Thi, T. Nguyen","doi":"10.1109/MAPR53640.2021.9585254","DOIUrl":null,"url":null,"abstract":"Mathematical formula detection (MFD) is a prerequisite step for the digitization of scientific documents. The MFD task has two key challenges, i.e. a large scale span between embedded formula and isolated formula, and a huge variation of the ratio between height and width. However, the detection accuracy of the most existing approaches rely on page segmentation still needs improvement due to the errors of complex documents. In this work, to solve the important problem of scale variation, we aim to assess the performance of a multi-scaled deformable method for the MFD task based on deformable convolution, image representation, and YOLOv5 detector. For the experimental study, the proposed method has been evaluated on the Marmot dataset, which is an existing benchmark. In our evaluation, the experimental results show that the proposed method outperforms previous methods on the Marmot dataset by a large margin. Moreover, we accomplished correct detection accuracy of 82.42% on embedded formulas and 90.69% on isolated formulas on the Marmot dataset, which results in a significant error reduction.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DS-YOLOv5: Deformable and Scalable YOLOv5 for Mathematical Formula Detection in Scientific Documents\",\"authors\":\"Minh-Thang Nguyen, Thi-Lan Le, Lan Huong Nguyen Thi, T. Nguyen\",\"doi\":\"10.1109/MAPR53640.2021.9585254\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mathematical formula detection (MFD) is a prerequisite step for the digitization of scientific documents. The MFD task has two key challenges, i.e. a large scale span between embedded formula and isolated formula, and a huge variation of the ratio between height and width. However, the detection accuracy of the most existing approaches rely on page segmentation still needs improvement due to the errors of complex documents. In this work, to solve the important problem of scale variation, we aim to assess the performance of a multi-scaled deformable method for the MFD task based on deformable convolution, image representation, and YOLOv5 detector. For the experimental study, the proposed method has been evaluated on the Marmot dataset, which is an existing benchmark. In our evaluation, the experimental results show that the proposed method outperforms previous methods on the Marmot dataset by a large margin. Moreover, we accomplished correct detection accuracy of 82.42% on embedded formulas and 90.69% on isolated formulas on the Marmot dataset, which results in a significant error reduction.\",\"PeriodicalId\":233540,\"journal\":{\"name\":\"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MAPR53640.2021.9585254\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPR53640.2021.9585254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DS-YOLOv5: Deformable and Scalable YOLOv5 for Mathematical Formula Detection in Scientific Documents
Mathematical formula detection (MFD) is a prerequisite step for the digitization of scientific documents. The MFD task has two key challenges, i.e. a large scale span between embedded formula and isolated formula, and a huge variation of the ratio between height and width. However, the detection accuracy of the most existing approaches rely on page segmentation still needs improvement due to the errors of complex documents. In this work, to solve the important problem of scale variation, we aim to assess the performance of a multi-scaled deformable method for the MFD task based on deformable convolution, image representation, and YOLOv5 detector. For the experimental study, the proposed method has been evaluated on the Marmot dataset, which is an existing benchmark. In our evaluation, the experimental results show that the proposed method outperforms previous methods on the Marmot dataset by a large margin. Moreover, we accomplished correct detection accuracy of 82.42% on embedded formulas and 90.69% on isolated formulas on the Marmot dataset, which results in a significant error reduction.