Dimitris Angelis, Prodromos Kolyvakis, Manos Kamarianakis, George Papagiannakis
{"title":"几何代数与大型语言模型:三维、交互式和可控场景中基于指令的独立网格变换","authors":"Dimitris Angelis, Prodromos Kolyvakis, Manos Kamarianakis, George Papagiannakis","doi":"arxiv-2408.02275","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel integration of Large Language Models (LLMs)\nwith Conformal Geometric Algebra (CGA) to revolutionize controllable 3D scene\nediting, particularly for object repositioning tasks, which traditionally\nrequires intricate manual processes and specialized expertise. These\nconventional methods typically suffer from reliance on large training datasets\nor lack a formalized language for precise edits. Utilizing CGA as a robust\nformal language, our system, shenlong, precisely models spatial transformations\nnecessary for accurate object repositioning. Leveraging the zero-shot learning\ncapabilities of pre-trained LLMs, shenlong translates natural language\ninstructions into CGA operations which are then applied to the scene,\nfacilitating exact spatial transformations within 3D scenes without the need\nfor specialized pre-training. Implemented in a realistic simulation\nenvironment, shenlong ensures compatibility with existing graphics pipelines.\nTo accurately assess the impact of CGA, we benchmark against robust Euclidean\nSpace baselines, evaluating both latency and accuracy. Comparative performance\nevaluations indicate that shenlong significantly reduces LLM response times by\n16% and boosts success rates by 9.6% on average compared to the traditional\nmethods. Notably, shenlong achieves a 100% perfect success rate in common\npractical queries, a benchmark where other systems fall short. These\nadvancements underscore shenlong's potential to democratize 3D scene editing,\nenhancing accessibility and fostering innovation across sectors such as\neducation, digital entertainment, and virtual reality.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"100 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes\",\"authors\":\"Dimitris Angelis, Prodromos Kolyvakis, Manos Kamarianakis, George Papagiannakis\",\"doi\":\"arxiv-2408.02275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a novel integration of Large Language Models (LLMs)\\nwith Conformal Geometric Algebra (CGA) to revolutionize controllable 3D scene\\nediting, particularly for object repositioning tasks, which traditionally\\nrequires intricate manual processes and specialized expertise. These\\nconventional methods typically suffer from reliance on large training datasets\\nor lack a formalized language for precise edits. Utilizing CGA as a robust\\nformal language, our system, shenlong, precisely models spatial transformations\\nnecessary for accurate object repositioning. Leveraging the zero-shot learning\\ncapabilities of pre-trained LLMs, shenlong translates natural language\\ninstructions into CGA operations which are then applied to the scene,\\nfacilitating exact spatial transformations within 3D scenes without the need\\nfor specialized pre-training. Implemented in a realistic simulation\\nenvironment, shenlong ensures compatibility with existing graphics pipelines.\\nTo accurately assess the impact of CGA, we benchmark against robust Euclidean\\nSpace baselines, evaluating both latency and accuracy. Comparative performance\\nevaluations indicate that shenlong significantly reduces LLM response times by\\n16% and boosts success rates by 9.6% on average compared to the traditional\\nmethods. Notably, shenlong achieves a 100% perfect success rate in common\\npractical queries, a benchmark where other systems fall short. These\\nadvancements underscore shenlong's potential to democratize 3D scene editing,\\nenhancing accessibility and fostering innovation across sectors such as\\neducation, digital entertainment, and virtual reality.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"100 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.02275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes
This paper introduces a novel integration of Large Language Models (LLMs)
with Conformal Geometric Algebra (CGA) to revolutionize controllable 3D scene
editing, particularly for object repositioning tasks, which traditionally
requires intricate manual processes and specialized expertise. These
conventional methods typically suffer from reliance on large training datasets
or lack a formalized language for precise edits. Utilizing CGA as a robust
formal language, our system, shenlong, precisely models spatial transformations
necessary for accurate object repositioning. Leveraging the zero-shot learning
capabilities of pre-trained LLMs, shenlong translates natural language
instructions into CGA operations which are then applied to the scene,
facilitating exact spatial transformations within 3D scenes without the need
for specialized pre-training. Implemented in a realistic simulation
environment, shenlong ensures compatibility with existing graphics pipelines.
To accurately assess the impact of CGA, we benchmark against robust Euclidean
Space baselines, evaluating both latency and accuracy. Comparative performance
evaluations indicate that shenlong significantly reduces LLM response times by
16% and boosts success rates by 9.6% on average compared to the traditional
methods. Notably, shenlong achieves a 100% perfect success rate in common
practical queries, a benchmark where other systems fall short. These
advancements underscore shenlong's potential to democratize 3D scene editing,
enhancing accessibility and fostering innovation across sectors such as
education, digital entertainment, and virtual reality.