{"title":"AutoMathKG: The Automated Mathematical Knowledge Graph Based on LLM and Vector Database","authors":"Rong Bian, Yu Geng, Zijian Yang, Bing Cheng","doi":"10.1111/coin.70096","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>A mathematical knowledge graph (KG) presents knowledge within mathematics in a structured manner. Constructing a math KG using natural language is an essential but challenging task. Existing methods have two major limitations: Incomplete knowledge due to limited corpora and a lack of fully automated integration of diverse sources. This paper proposes AutoMathKG, a high-quality, wide-coverage, and multi-dimensional math KG capable of automatic updates. AutoMathKG regards mathematics as a vast directed graph composed of Definition, Theorem, and Problem entities, with their reference relationships as edges. It integrates knowledge from ProofWiki, textbooks, arXiv papers, and TheoremQA, enhanced through large language models (LLMs) for data augmentation. To search for similar entities, MathVD, a vector database, is built through two designed embedding strategies. To automatically update, two mechanisms are proposed. For knowledge completion, Math LLM is developed to interact with AutoMathKG, providing missing proofs or solutions. For knowledge fusion, MathVD is used to retrieve similar entities, and LLM is used to determine whether to merge with a candidate or add a new entity. Extensive experiments demonstrate the advanced performance of the AutoMathKG system, including superior reachability query results in MathVD compared to five baselines and robust mathematical reasoning capability in Math LLM.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70096","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
A mathematical knowledge graph (KG) presents knowledge within mathematics in a structured manner. Constructing a math KG using natural language is an essential but challenging task. Existing methods have two major limitations: Incomplete knowledge due to limited corpora and a lack of fully automated integration of diverse sources. This paper proposes AutoMathKG, a high-quality, wide-coverage, and multi-dimensional math KG capable of automatic updates. AutoMathKG regards mathematics as a vast directed graph composed of Definition, Theorem, and Problem entities, with their reference relationships as edges. It integrates knowledge from ProofWiki, textbooks, arXiv papers, and TheoremQA, enhanced through large language models (LLMs) for data augmentation. To search for similar entities, MathVD, a vector database, is built through two designed embedding strategies. To automatically update, two mechanisms are proposed. For knowledge completion, Math LLM is developed to interact with AutoMathKG, providing missing proofs or solutions. For knowledge fusion, MathVD is used to retrieve similar entities, and LLM is used to determine whether to merge with a candidate or add a new entity. Extensive experiments demonstrate the advanced performance of the AutoMathKG system, including superior reachability query results in MathVD compared to five baselines and robust mathematical reasoning capability in Math LLM.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.