自动几何图像数据集创建增强几何理解

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-04-10 DOI:10.1109/TMM.2025.3557720

Zihan Huang;Tao Wu;Wang Lin;Shengyu Zhang;Jingyuan Chen;Fei Wu

{"title":"自动几何图像数据集创建增强几何理解","authors":"Zihan Huang;Tao Wu;Wang Lin;Shengyu Zhang;Jingyuan Chen;Fei Wu","doi":"10.1109/TMM.2025.3557720","DOIUrl":null,"url":null,"abstract":"With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning. However, existing research has primarily focused on text-based algebra problems, neglecting the study of geometry due to the lack of high-quality geometric datasets. To address this gap, this paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images to fulfill the demand for large-scale and diverse geometric datasets. AutoGeo facilitates the creation of AutoGeo-100 k, an extensive repository comprising 100 k high-quality geometry image-text pairs. By leveraging precisely defined geometric clauses, AutoGeo-100 k contains a wide variety of geometric shapes, including lines, polygons, circles, and complex spatial relationships, etc. Furthermore, this paper demonstrates the efficacy of AutoGeo-100 k in enhancing the performance of multimodal large language models through fine-tuning. Experimental results indicate significant improvements in the model's ability in handling geometric images, as evidenced by enhanced accuracy in tasks such as geometric captioning and mathematical reasoning. This research not only fills a critical gap in the availability of geometric datasets but also paves the way for the advancement of sophisticated AI-driven tools in education and research.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3105-3116"},"PeriodicalIF":9.7000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding\",\"authors\":\"Zihan Huang;Tao Wu;Wang Lin;Shengyu Zhang;Jingyuan Chen;Fei Wu\",\"doi\":\"10.1109/TMM.2025.3557720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning. However, existing research has primarily focused on text-based algebra problems, neglecting the study of geometry due to the lack of high-quality geometric datasets. To address this gap, this paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images to fulfill the demand for large-scale and diverse geometric datasets. AutoGeo facilitates the creation of AutoGeo-100 k, an extensive repository comprising 100 k high-quality geometry image-text pairs. By leveraging precisely defined geometric clauses, AutoGeo-100 k contains a wide variety of geometric shapes, including lines, polygons, circles, and complex spatial relationships, etc. Furthermore, this paper demonstrates the efficacy of AutoGeo-100 k in enhancing the performance of multimodal large language models through fine-tuning. Experimental results indicate significant improvements in the model's ability in handling geometric images, as evidenced by enhanced accuracy in tasks such as geometric captioning and mathematical reasoning. This research not only fills a critical gap in the availability of geometric datasets but also paves the way for the advancement of sophisticated AI-driven tools in education and research.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"3105-3116\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10960701/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10960701/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着大型语言模型的快速发展，人们对它们在数学推理中的能力越来越感兴趣。然而，由于缺乏高质量的几何数据集，现有的研究主要集中在基于文本的代数问题上，而忽视了几何问题的研究。为了解决这一问题，本文介绍了自动生成数学几何图像的新方法autogo，以满足对大规模和多样化几何数据集的需求。augeto促进了augeto - 100k的创建，这是一个包含100k高质量几何图像-文本对的广泛存储库。通过利用精确定义的几何条款，augego - 100k包含各种几何形状，包括线，多边形，圆和复杂的空间关系等。此外，本文还证明了autogo - 100k通过微调来提高多模态大型语言模型的性能。实验结果表明，该模型在处理几何图像方面的能力有了显著提高，在几何标题和数学推理等任务上的准确性得到了提高。这项研究不仅填补了几何数据集可用性方面的关键空白，而且为先进的人工智能驱动工具在教育和研究中的发展铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding

With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning. However, existing research has primarily focused on text-based algebra problems, neglecting the study of geometry due to the lack of high-quality geometric datasets. To address this gap, this paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images to fulfill the demand for large-scale and diverse geometric datasets. AutoGeo facilitates the creation of AutoGeo-100 k, an extensive repository comprising 100 k high-quality geometry image-text pairs. By leveraging precisely defined geometric clauses, AutoGeo-100 k contains a wide variety of geometric shapes, including lines, polygons, circles, and complex spatial relationships, etc. Furthermore, this paper demonstrates the efficacy of AutoGeo-100 k in enhancing the performance of multimodal large language models through fine-tuning. Experimental results indicate significant improvements in the model's ability in handling geometric images, as evidenced by enhanced accuracy in tasks such as geometric captioning and mathematical reasoning. This research not only fills a critical gap in the availability of geometric datasets but also paves the way for the advancement of sophisticated AI-driven tools in education and research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.