{"title":"An N-ary Tree-based Model for Similarity Evaluation on Mathematical Formulae","authors":"Yifan Dai, Liangyu Chen, Zihan Zhang","doi":"10.1109/SMC42975.2020.9283495","DOIUrl":null,"url":null,"abstract":"Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.","PeriodicalId":6718,"journal":{"name":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","volume":"103 1","pages":"2578-2584"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMC42975.2020.9283495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.