Philipp Scharpf, Ian Mackerracher, M. Schubotz, J. Beel, Corinna Breitinger, Bela Gipp
{"title":"AnnoMathTeX - a formula identifier annotation recommender system for STEM documents","authors":"Philipp Scharpf, Ian Mackerracher, M. Schubotz, J. Beel, Corinna Breitinger, Bela Gipp","doi":"10.1145/3298689.3347042","DOIUrl":null,"url":null,"abstract":"Documents from science, technology, engineering and mathematics (STEM) often contain a large number of mathematical formulae alongside text. Semantic search, recommender, and question answering systems require the occurring formula constants and variables (identifiers) to be disambiguated. We present a first implementation of a recommender system that enables and accelerates formula annotation by displaying the most likely candidates for formula and identifier names from four different sources (arXiv, Wikipedia, Wikidata, or the surrounding text). A first evaluation shows that in total, 78% of the formula identifier name recommendations were accepted by the user as a suitable annotation. Furthermore, document-wide annotation saved the user the annotation of ten times more other identifier occurrences. Our long-term vision is to integrate the annotation recommender into the edit-view of Wikipedia and the online LaTeX editor Overleaf.","PeriodicalId":215384,"journal":{"name":"Proceedings of the 13th ACM Conference on Recommender Systems","volume":"40 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM Conference on Recommender Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3298689.3347042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Documents from science, technology, engineering and mathematics (STEM) often contain a large number of mathematical formulae alongside text. Semantic search, recommender, and question answering systems require the occurring formula constants and variables (identifiers) to be disambiguated. We present a first implementation of a recommender system that enables and accelerates formula annotation by displaying the most likely candidates for formula and identifier names from four different sources (arXiv, Wikipedia, Wikidata, or the surrounding text). A first evaluation shows that in total, 78% of the formula identifier name recommendations were accepted by the user as a suitable annotation. Furthermore, document-wide annotation saved the user the annotation of ten times more other identifier occurrences. Our long-term vision is to integrate the annotation recommender into the edit-view of Wikipedia and the online LaTeX editor Overleaf.