基于n元树的数学公式相似性评价模型

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Pub Date : 2020-10-11 DOI:10.1109/SMC42975.2020.9283495

Yifan Dai, Liangyu Chen, Zihan Zhang

{"title":"基于n元树的数学公式相似性评价模型","authors":"Yifan Dai, Liangyu Chen, Zihan Zhang","doi":"10.1109/SMC42975.2020.9283495","DOIUrl":null,"url":null,"abstract":"Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.","PeriodicalId":6718,"journal":{"name":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","volume":"103 1","pages":"2578-2584"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An N-ary Tree-based Model for Similarity Evaluation on Mathematical Formulae\",\"authors\":\"Yifan Dai, Liangyu Chen, Zihan Zhang\",\"doi\":\"10.1109/SMC42975.2020.9283495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.\",\"PeriodicalId\":6718,\"journal\":{\"name\":\"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\",\"volume\":\"103 1\",\"pages\":\"2578-2584\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMC42975.2020.9283495\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMC42975.2020.9283495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

准确、高效地度量数学公式之间的相似度在数学信息检索中起着重要作用。以往的研究大多集中在对不同类型的公式进行表征，捕捉其特征，并结合传统的结构匹配算法。本文提出了一种新的无监督模型——基于n元树的公式嵌入模型(NTFEM)，用于数学相似性评价。我们使用n元树结构来表示公式，将公式转换为可视为输入句子的线性序列，然后使用词嵌入模型嵌入公式。根据数学公式的特点，利用加权函数得到最终的加权平均嵌入向量。通过在ntir -12维基百科公式浏览任务上的实验，我们的模型在Bpref预测指标上优于以往的公式搜索引擎。此外，与传统的基于树的模型相比，NTFEM不仅提高了检索效果，而且大大缩短了训练时间，提高了训练效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An N-ary Tree-based Model for Similarity Evaluation on Mathematical Formulae

Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

自引率

0.00%

发文量