AutoMathKG: The Automated Mathematical Knowledge Graph Based on LLM and Vector Database

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Intelligence Pub Date : 2025-07-02 DOI:10.1111/coin.70096

Rong Bian, Yu Geng, Zijian Yang, Bing Cheng

{"title":"AutoMathKG: The Automated Mathematical Knowledge Graph Based on LLM and Vector Database","authors":"Rong Bian, Yu Geng, Zijian Yang, Bing Cheng","doi":"10.1111/coin.70096","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>A mathematical knowledge graph (KG) presents knowledge within mathematics in a structured manner. Constructing a math KG using natural language is an essential but challenging task. Existing methods have two major limitations: Incomplete knowledge due to limited corpora and a lack of fully automated integration of diverse sources. This paper proposes AutoMathKG, a high-quality, wide-coverage, and multi-dimensional math KG capable of automatic updates. AutoMathKG regards mathematics as a vast directed graph composed of Definition, Theorem, and Problem entities, with their reference relationships as edges. It integrates knowledge from ProofWiki, textbooks, arXiv papers, and TheoremQA, enhanced through large language models (LLMs) for data augmentation. To search for similar entities, MathVD, a vector database, is built through two designed embedding strategies. To automatically update, two mechanisms are proposed. For knowledge completion, Math LLM is developed to interact with AutoMathKG, providing missing proofs or solutions. For knowledge fusion, MathVD is used to retrieve similar entities, and LLM is used to determine whether to merge with a candidate or add a new entity. Extensive experiments demonstrate the advanced performance of the AutoMathKG system, including superior reachability query results in MathVD compared to five baselines and robust mathematical reasoning capability in Math LLM.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70096","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

A mathematical knowledge graph (KG) presents knowledge within mathematics in a structured manner. Constructing a math KG using natural language is an essential but challenging task. Existing methods have two major limitations: Incomplete knowledge due to limited corpora and a lack of fully automated integration of diverse sources. This paper proposes AutoMathKG, a high-quality, wide-coverage, and multi-dimensional math KG capable of automatic updates. AutoMathKG regards mathematics as a vast directed graph composed of Definition, Theorem, and Problem entities, with their reference relationships as edges. It integrates knowledge from ProofWiki, textbooks, arXiv papers, and TheoremQA, enhanced through large language models (LLMs) for data augmentation. To search for similar entities, MathVD, a vector database, is built through two designed embedding strategies. To automatically update, two mechanisms are proposed. For knowledge completion, Math LLM is developed to interact with AutoMathKG, providing missing proofs or solutions. For knowledge fusion, MathVD is used to retrieve similar entities, and LLM is used to determine whether to merge with a candidate or add a new entity. Extensive experiments demonstrate the advanced performance of the AutoMathKG system, including superior reachability query results in MathVD compared to five baselines and robust mathematical reasoning capability in Math LLM.

查看原文本刊更多论文

AutoMathKG：基于LLM和矢量数据库的自动化数学知识图谱

数学知识图（KG）以结构化的方式表示数学中的知识。使用自然语言构建数学KG是一项必要但具有挑战性的任务。现有的方法有两个主要的局限性：由于有限的语料库而导致的知识不完整，以及缺乏对各种来源的完全自动化集成。本文提出了AutoMathKG，一个具有自动更新功能的高质量、广覆盖、多维数学KG。AutoMathKG认为数学是一个巨大的有向图，由定义、定理和问题实体组成，它们之间的参考关系是边。它集成了来自ProofWiki、教科书、arXiv论文和theormqa的知识，并通过大型语言模型（llm）进行了数据增强。为了搜索相似的实体，通过两种设计的嵌入策略构建了向量数据库MathVD。为了实现自动更新，提出了两种机制。为了完成知识，Math LLM被开发为与AutoMathKG交互，提供缺失的证明或解决方案。对于知识融合，MathVD用于检索相似的实体，LLM用于确定是否与候选实体合并或添加新实体。大量的实验证明了AutoMathKG系统的先进性能，包括MathVD中与五个基线相比优越的可达性查询结果和Math LLM中强大的数学推理能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Intelligence 工程技术-计算机：人工智能

CiteScore

6.90

自引率

3.60%

发文量

审稿时长

>12 weeks

期刊介绍： This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.