从广播视频中学习模拟网球技术

ACM Transactions on Graphics (TOG) Pub Date : 2023-07-26 DOI:10.1145/3592408

Haotian Zhang, Ye Yuan, Viktor Makoviychuk, Yunrong Guo, S. Fidler, X. B. Peng, K. Fatahalian

{"title":"从广播视频中学习模拟网球技术","authors":"Haotian Zhang, Ye Yuan, Viktor Makoviychuk, Yunrong Guo, S. Fidler, X. B. Peng, K. Fatahalian","doi":"10.1145/3592408","DOIUrl":null,"url":null,"abstract":"We present a system that learns diverse, physically simulated tennis skills from large-scale demonstrations of tennis play harvested from broadcast videos. Our approach is built upon hierarchical models, combining a low-level imitation policy and a high-level motion planning policy to steer the character in a motion embedding learned from broadcast videos. When deployed at scale on large video collections that encompass a vast set of examples of real-world tennis play, our approach can learn complex tennis shotmaking skills and realistically chain together multiple shots into extended rallies, using only simple rewards and without explicit annotations of stroke types. To address the low quality of motions extracted from broadcast videos, we correct estimated motion with physics-based imitation, and use a hybrid control policy that overrides erroneous aspects of the learned motion embedding with corrections predicted by the high-level policy. We demonstrate that our system produces controllers for physically-simulated tennis players that can hit the incoming ball to target positions accurately using a diverse array of strokes (serves, forehands, and backhands), spins (topspins and slices), and playing styles (one/two-handed backhands, left/right-handed play). Overall, our system can synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics. Code and data for this work is available at https://research.nvidia.com/labs/toronto-ai/vid2player3d/.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":"16 1","pages":"1 - 14"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Learning Physically Simulated Tennis Skills from Broadcast Videos\",\"authors\":\"Haotian Zhang, Ye Yuan, Viktor Makoviychuk, Yunrong Guo, S. Fidler, X. B. Peng, K. Fatahalian\",\"doi\":\"10.1145/3592408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a system that learns diverse, physically simulated tennis skills from large-scale demonstrations of tennis play harvested from broadcast videos. Our approach is built upon hierarchical models, combining a low-level imitation policy and a high-level motion planning policy to steer the character in a motion embedding learned from broadcast videos. When deployed at scale on large video collections that encompass a vast set of examples of real-world tennis play, our approach can learn complex tennis shotmaking skills and realistically chain together multiple shots into extended rallies, using only simple rewards and without explicit annotations of stroke types. To address the low quality of motions extracted from broadcast videos, we correct estimated motion with physics-based imitation, and use a hybrid control policy that overrides erroneous aspects of the learned motion embedding with corrections predicted by the high-level policy. We demonstrate that our system produces controllers for physically-simulated tennis players that can hit the incoming ball to target positions accurately using a diverse array of strokes (serves, forehands, and backhands), spins (topspins and slices), and playing styles (one/two-handed backhands, left/right-handed play). Overall, our system can synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics. Code and data for this work is available at https://research.nvidia.com/labs/toronto-ai/vid2player3d/.\",\"PeriodicalId\":7077,\"journal\":{\"name\":\"ACM Transactions on Graphics (TOG)\",\"volume\":\"16 1\",\"pages\":\"1 - 14\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Graphics (TOG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3592408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics (TOG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3592408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

我们提出了一个系统，该系统可以从广播视频中收集的大规模网球比赛演示中学习各种物理模拟网球技能。我们的方法建立在分层模型之上，结合了低级模仿策略和高级运动规划策略来引导从广播视频中学习的运动嵌入中的角色。当大规模部署在包含大量现实世界网球比赛示例的大型视频集上时，我们的方法可以学习复杂的网球击球技巧，并实际地将多个击球连接到一起，仅使用简单的奖励，而无需明确的击球类型注释。为了解决从广播视频中提取的低质量运动，我们使用基于物理的模仿来纠正估计的运动，并使用混合控制策略，该策略使用高级策略预测的纠正来覆盖学习到的运动嵌入的错误方面。我们证明了我们的系统为物理模拟的网球运动员产生控制器，可以使用各种击球(发球，正手和反手)，旋转(上旋球和切线)和打球风格(单/双手反手，左手/右手)准确地将来球击中目标位置。总的来说，我们的系统可以合成两个物理模拟人物，通过模拟球拍和球的动力学来进行延长的网球比赛。这项工作的代码和数据可在https://research.nvidia.com/labs/toronto-ai/vid2player3d/上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Physically Simulated Tennis Skills from Broadcast Videos

We present a system that learns diverse, physically simulated tennis skills from large-scale demonstrations of tennis play harvested from broadcast videos. Our approach is built upon hierarchical models, combining a low-level imitation policy and a high-level motion planning policy to steer the character in a motion embedding learned from broadcast videos. When deployed at scale on large video collections that encompass a vast set of examples of real-world tennis play, our approach can learn complex tennis shotmaking skills and realistically chain together multiple shots into extended rallies, using only simple rewards and without explicit annotations of stroke types. To address the low quality of motions extracted from broadcast videos, we correct estimated motion with physics-based imitation, and use a hybrid control policy that overrides erroneous aspects of the learned motion embedding with corrections predicted by the high-level policy. We demonstrate that our system produces controllers for physically-simulated tennis players that can hit the incoming ball to target positions accurately using a diverse array of strokes (serves, forehands, and backhands), spins (topspins and slices), and playing styles (one/two-handed backhands, left/right-handed play). Overall, our system can synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics. Code and data for this work is available at https://research.nvidia.com/labs/toronto-ai/vid2player3d/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Graphics (TOG)

自引率

0.00%

发文量