{"title":"WikiMirs 3.0: A Hybrid MIR System Based on the Context, Structure and Importance of Formulae in a Document","authors":"Yuehan Wang, Liangcai Gao, Simeng Wang, Zhi Tang, Xiaozhong Liu, Ke Yuan","doi":"10.1145/2756406.2756918","DOIUrl":null,"url":null,"abstract":"Nowadays, mathematical information is increasingly available in websites and repositories, such like ArXiv, Wikipedia and growing numbers of digital libraries. Mathematical formulae are highly structured and usually presented in layout presentations, such as PDF, LATEX and Presentation MathML. The differences of presentation between text and formulae challenge traditional text-based index and retrieval methods. To address the challenge, this paper proposes an upgraded Mathematical Information Retrieval (MIR) system, namely WikiMirs 3.0, based on the context, structure and importance of formulae in a document. In WikiMirs 3.0, users can easily \"cut\" formulae and contexts from PDF documents as well as type in queries. Furthermore, a novel hybrid indexing and matching model is proposed to support both exact and fuzzy matching. In the hybrid model, both context and structure information of formulae are taken into consideration. In addition, the concept of formula importance within a document is introduced into the model for more reasonable ranking. Experimental results, compared with two classical MIR systems, demonstrate that the proposed system along with the novel model provides higher accuracy and better ranking results over Wikipedia.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2756406.2756918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Nowadays, mathematical information is increasingly available in websites and repositories, such like ArXiv, Wikipedia and growing numbers of digital libraries. Mathematical formulae are highly structured and usually presented in layout presentations, such as PDF, LATEX and Presentation MathML. The differences of presentation between text and formulae challenge traditional text-based index and retrieval methods. To address the challenge, this paper proposes an upgraded Mathematical Information Retrieval (MIR) system, namely WikiMirs 3.0, based on the context, structure and importance of formulae in a document. In WikiMirs 3.0, users can easily "cut" formulae and contexts from PDF documents as well as type in queries. Furthermore, a novel hybrid indexing and matching model is proposed to support both exact and fuzzy matching. In the hybrid model, both context and structure information of formulae are taken into consideration. In addition, the concept of formula importance within a document is introduced into the model for more reasonable ranking. Experimental results, compared with two classical MIR systems, demonstrate that the proposed system along with the novel model provides higher accuracy and better ranking results over Wikipedia.