{"title":"利用长短期记忆网络识别公式蕴涵","authors":"Amarnath Pathak, Partha Pakray","doi":"10.1177/01655515231184826","DOIUrl":null,"url":null,"abstract":"The article presents an approach to recognise formula entailment, which concerns finding entailment relationships between pairs of math formulae. As the current formula-similarity-detection approaches fail to account for broader relationships between pairs of math formulae, recognising formula entailment becomes paramount. To this end, a long short-term memory (LSTM) neural network using symbol-by-symbol attention for recognising formula entailment is implemented. However, owing to the unavailability of relevant training and validation corpora, the first and foremost step is to create a sufficiently large-sized symbol-level MATHENTAIL data set in an automated fashion. Depending on the extent of similarity between the corresponding symbol embeddings, the symbol pairs in the MATHENTAIL data set are assigned ‘entailment’ or ‘neutral’ labels. An improved symbol-to-vector (isymbol2vec) method generates mathematical symbols (in LATEX) and their embeddings using the Wikipedia corpus of scientific documents and Continuous Bag of Words (CBOW) architecture. Eventually, the LSTM network, trained and validated using the MATHENTAIL data set, predicts formulae entailment for test formulae pairs with a reasonable accuracy of 62.2%.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recognising formula entailment using long short-term memory network\",\"authors\":\"Amarnath Pathak, Partha Pakray\",\"doi\":\"10.1177/01655515231184826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article presents an approach to recognise formula entailment, which concerns finding entailment relationships between pairs of math formulae. As the current formula-similarity-detection approaches fail to account for broader relationships between pairs of math formulae, recognising formula entailment becomes paramount. To this end, a long short-term memory (LSTM) neural network using symbol-by-symbol attention for recognising formula entailment is implemented. However, owing to the unavailability of relevant training and validation corpora, the first and foremost step is to create a sufficiently large-sized symbol-level MATHENTAIL data set in an automated fashion. Depending on the extent of similarity between the corresponding symbol embeddings, the symbol pairs in the MATHENTAIL data set are assigned ‘entailment’ or ‘neutral’ labels. An improved symbol-to-vector (isymbol2vec) method generates mathematical symbols (in LATEX) and their embeddings using the Wikipedia corpus of scientific documents and Continuous Bag of Words (CBOW) architecture. Eventually, the LSTM network, trained and validated using the MATHENTAIL data set, predicts formulae entailment for test formulae pairs with a reasonable accuracy of 62.2%.\",\"PeriodicalId\":54796,\"journal\":{\"name\":\"Journal of Information Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1177/01655515231184826\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515231184826","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Recognising formula entailment using long short-term memory network
The article presents an approach to recognise formula entailment, which concerns finding entailment relationships between pairs of math formulae. As the current formula-similarity-detection approaches fail to account for broader relationships between pairs of math formulae, recognising formula entailment becomes paramount. To this end, a long short-term memory (LSTM) neural network using symbol-by-symbol attention for recognising formula entailment is implemented. However, owing to the unavailability of relevant training and validation corpora, the first and foremost step is to create a sufficiently large-sized symbol-level MATHENTAIL data set in an automated fashion. Depending on the extent of similarity between the corresponding symbol embeddings, the symbol pairs in the MATHENTAIL data set are assigned ‘entailment’ or ‘neutral’ labels. An improved symbol-to-vector (isymbol2vec) method generates mathematical symbols (in LATEX) and their embeddings using the Wikipedia corpus of scientific documents and Continuous Bag of Words (CBOW) architecture. Eventually, the LSTM network, trained and validated using the MATHENTAIL data set, predicts formulae entailment for test formulae pairs with a reasonable accuracy of 62.2%.
期刊介绍:
The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.