H.H. Nguyen , M.N. Vu , F. Beck , G. Ebmer , A. Nguyen , W. Kemmetmueller , A. Kugi
{"title":"基于模型预测轨迹优化的语言驱动闭环抓取","authors":"H.H. Nguyen , M.N. Vu , F. Beck , G. Ebmer , A. Nguyen , W. Kemmetmueller , A. Kugi","doi":"10.1016/j.mechatronics.2025.103335","DOIUrl":null,"url":null,"abstract":"<div><div>Combining a vision module inside a closed-loop control system for the <em>seamless movement</em> of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a <em>modular</em> zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within <span><math><mrow><mtext>0.5</mtext><mspace></mspace><mtext>s</mtext></mrow></math></span> by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experimental results demonstrate the real-time capability of the proposed zero-shot modular framework to accurately and efficiently grasp moving objects. The framework achieves update rates of up to 30<!--> <!-->Hz for the online 6D pose localization module and 10<!--> <!-->Hz for the receding-horizon trajectory optimization. These advantages highlight the modular framework’s potential applications in robotics and human–robot interaction; see the video at <span><span>language-driven-grasping.github.io</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49842,"journal":{"name":"Mechatronics","volume":"109 ","pages":"Article 103335"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Language-driven closed-loop grasping with model-predictive trajectory optimization\",\"authors\":\"H.H. Nguyen , M.N. Vu , F. Beck , G. Ebmer , A. Nguyen , W. Kemmetmueller , A. Kugi\",\"doi\":\"10.1016/j.mechatronics.2025.103335\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Combining a vision module inside a closed-loop control system for the <em>seamless movement</em> of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a <em>modular</em> zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within <span><math><mrow><mtext>0.5</mtext><mspace></mspace><mtext>s</mtext></mrow></math></span> by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experimental results demonstrate the real-time capability of the proposed zero-shot modular framework to accurately and efficiently grasp moving objects. The framework achieves update rates of up to 30<!--> <!-->Hz for the online 6D pose localization module and 10<!--> <!-->Hz for the receding-horizon trajectory optimization. These advantages highlight the modular framework’s potential applications in robotics and human–robot interaction; see the video at <span><span>language-driven-grasping.github.io</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49842,\"journal\":{\"name\":\"Mechatronics\",\"volume\":\"109 \",\"pages\":\"Article 103335\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mechatronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957415825000443\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mechatronics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957415825000443","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Language-driven closed-loop grasping with model-predictive trajectory optimization
Combining a vision module inside a closed-loop control system for the seamless movement of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a modular zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experimental results demonstrate the real-time capability of the proposed zero-shot modular framework to accurately and efficiently grasp moving objects. The framework achieves update rates of up to 30 Hz for the online 6D pose localization module and 10 Hz for the receding-horizon trajectory optimization. These advantages highlight the modular framework’s potential applications in robotics and human–robot interaction; see the video at language-driven-grasping.github.io.
期刊介绍:
Mechatronics is the synergistic combination of precision mechanical engineering, electronic control and systems thinking in the design of products and manufacturing processes. It relates to the design of systems, devices and products aimed at achieving an optimal balance between basic mechanical structure and its overall control. The purpose of this journal is to provide rapid publication of topical papers featuring practical developments in mechatronics. It will cover a wide range of application areas including consumer product design, instrumentation, manufacturing methods, computer integration and process and device control, and will attract a readership from across the industrial and academic research spectrum. Particular importance will be attached to aspects of innovation in mechatronics design philosophy which illustrate the benefits obtainable by an a priori integration of functionality with embedded microprocessor control. A major item will be the design of machines, devices and systems possessing a degree of computer based intelligence. The journal seeks to publish research progress in this field with an emphasis on the applied rather than the theoretical. It will also serve the dual role of bringing greater recognition to this important area of engineering.