{"title":"数学公式信息检索系统","authors":"Yong Hou","doi":"10.3233/jcm-226961","DOIUrl":null,"url":null,"abstract":"Design and implementation of the system for retrieving information about mathematical formulas – MFIRS. The structure of the system is mainly divided into the modules: input normalization, mathematical formula unification, mathematical formula encoding, text information feature extraction, mathematical formula feature extraction, mathematical formula indexing, retrieval and ranking. A method for extracting mathematical formulas and keywords based on FastText word embedding technology is proposed. This method can be used not only to get the structural features of the formula, but also to facilitate the calculation of the similarity of the formula by the vector result. At the same time, the model introduces the semantic features of context-rich mathematical formulas to improve the domain correlation of search results. The MathRetEval dataset was created based on about 7.9 × 105 arXiv documents and about 1.5 × 108 mathematical formulas. The scalability of the system is verified using this data set. The mathematical formulas can be written in the language TEX or MathML. When queried in the TEX language, it can be converted to a tree representation of the MathML representation and then indexed. This MFIRS is an information retrieval system for mathematical formulas with the features of mathematical perception, which can use the search for the similarity of partial formulas.","PeriodicalId":45004,"journal":{"name":"Journal of Computational Methods in Sciences and Engineering","volume":"39 9","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mathematical formula information retrieval system\",\"authors\":\"Yong Hou\",\"doi\":\"10.3233/jcm-226961\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Design and implementation of the system for retrieving information about mathematical formulas – MFIRS. The structure of the system is mainly divided into the modules: input normalization, mathematical formula unification, mathematical formula encoding, text information feature extraction, mathematical formula feature extraction, mathematical formula indexing, retrieval and ranking. A method for extracting mathematical formulas and keywords based on FastText word embedding technology is proposed. This method can be used not only to get the structural features of the formula, but also to facilitate the calculation of the similarity of the formula by the vector result. At the same time, the model introduces the semantic features of context-rich mathematical formulas to improve the domain correlation of search results. The MathRetEval dataset was created based on about 7.9 × 105 arXiv documents and about 1.5 × 108 mathematical formulas. The scalability of the system is verified using this data set. The mathematical formulas can be written in the language TEX or MathML. When queried in the TEX language, it can be converted to a tree representation of the MathML representation and then indexed. This MFIRS is an information retrieval system for mathematical formulas with the features of mathematical perception, which can use the search for the similarity of partial formulas.\",\"PeriodicalId\":45004,\"journal\":{\"name\":\"Journal of Computational Methods in Sciences and Engineering\",\"volume\":\"39 9\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2023-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Methods in Sciences and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/jcm-226961\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Methods in Sciences and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jcm-226961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
Design and implementation of the system for retrieving information about mathematical formulas – MFIRS. The structure of the system is mainly divided into the modules: input normalization, mathematical formula unification, mathematical formula encoding, text information feature extraction, mathematical formula feature extraction, mathematical formula indexing, retrieval and ranking. A method for extracting mathematical formulas and keywords based on FastText word embedding technology is proposed. This method can be used not only to get the structural features of the formula, but also to facilitate the calculation of the similarity of the formula by the vector result. At the same time, the model introduces the semantic features of context-rich mathematical formulas to improve the domain correlation of search results. The MathRetEval dataset was created based on about 7.9 × 105 arXiv documents and about 1.5 × 108 mathematical formulas. The scalability of the system is verified using this data set. The mathematical formulas can be written in the language TEX or MathML. When queried in the TEX language, it can be converted to a tree representation of the MathML representation and then indexed. This MFIRS is an information retrieval system for mathematical formulas with the features of mathematical perception, which can use the search for the similarity of partial formulas.
期刊介绍:
The major goal of the Journal of Computational Methods in Sciences and Engineering (JCMSE) is the publication of new research results on computational methods in sciences and engineering. Common experience had taught us that computational methods originally developed in a given basic science, e.g. physics, can be of paramount importance to other neighboring sciences, e.g. chemistry, as well as to engineering or technology and, in turn, to society as a whole. This undoubtedly beneficial practice of interdisciplinary interactions will be continuously and systematically encouraged by the JCMSE.