MMTrans:多模态变压器的现实视频虚拟试戴

Xinrong Hu, Ziyi Zhang, Ruiqi Luo, Junjie Huang, Jinxing Liang, Jin Huang, Tao Peng, Hao Cai
{"title":"MMTrans:多模态变压器的现实视频虚拟试戴","authors":"Xinrong Hu, Ziyi Zhang, Ruiqi Luo, Junjie Huang, Jinxing Liang, Jin Huang, Tao Peng, Hao Cai","doi":"10.1145/3574131.3574431","DOIUrl":null,"url":null,"abstract":"Video virtual try-on methods aim to generate coherent, smooth, and realistic try-on videos, it matches the target clothing with the person in the video in a spatiotemporally consistent manner. Existing methods can match the human body with the clothing and then present it by the way of video, however it will cause the problem of excessive distortion of the grid and poor display effect at last. Given the problem, we found that was due to the neglect of the relationship between inputs lead to the loss of some features, while the conventional convolution operation is difficult to establish the remote information that is crucial in generating globally consistent results, restriction on clothing texture detail can lead to excessive deformation in the process of TPS fitting, make a lot of the try-on method in the final video rendering is not real. For the above problems, we reduce the excessive distortion of the garment during deformation by using a constraint function to regularize the TPS parameters; it also proposes a multimodal two-stage combinatorial Transformer: in the first stage, an interaction module is added, in which the long-distance relationship between people and clothing can be simulated, and then a better remote relationship can be obtained as well as contribute to the performance of TPS; in the second stage, an activation module is added, which can establish a global dependency, and this dependency can make the input important regions in the data are more prominent, which can provide more natural intermediate inputs for subsequent U-net networks. This paper’s method can bring better results for video virtual fitting, and experiments on the VVT dataset prove that the method outperforms previous methods in both quantitative and qualitative aspects.","PeriodicalId":111802,"journal":{"name":"Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry","volume":"222 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MMTrans: MultiModal Transformer for realistic video virtual try-on\",\"authors\":\"Xinrong Hu, Ziyi Zhang, Ruiqi Luo, Junjie Huang, Jinxing Liang, Jin Huang, Tao Peng, Hao Cai\",\"doi\":\"10.1145/3574131.3574431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video virtual try-on methods aim to generate coherent, smooth, and realistic try-on videos, it matches the target clothing with the person in the video in a spatiotemporally consistent manner. Existing methods can match the human body with the clothing and then present it by the way of video, however it will cause the problem of excessive distortion of the grid and poor display effect at last. Given the problem, we found that was due to the neglect of the relationship between inputs lead to the loss of some features, while the conventional convolution operation is difficult to establish the remote information that is crucial in generating globally consistent results, restriction on clothing texture detail can lead to excessive deformation in the process of TPS fitting, make a lot of the try-on method in the final video rendering is not real. For the above problems, we reduce the excessive distortion of the garment during deformation by using a constraint function to regularize the TPS parameters; it also proposes a multimodal two-stage combinatorial Transformer: in the first stage, an interaction module is added, in which the long-distance relationship between people and clothing can be simulated, and then a better remote relationship can be obtained as well as contribute to the performance of TPS; in the second stage, an activation module is added, which can establish a global dependency, and this dependency can make the input important regions in the data are more prominent, which can provide more natural intermediate inputs for subsequent U-net networks. This paper’s method can bring better results for video virtual fitting, and experiments on the VVT dataset prove that the method outperforms previous methods in both quantitative and qualitative aspects.\",\"PeriodicalId\":111802,\"journal\":{\"name\":\"Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry\",\"volume\":\"222 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3574131.3574431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3574131.3574431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

视频虚拟试戴方法旨在生成连贯、流畅、逼真的试戴视频,将目标服装与视频中的人物在时空上保持一致。现有的方法可以将人体与服装进行匹配,然后以视频的方式呈现,但最终会造成网格过度失真,显示效果不佳的问题。针对这个问题,我们发现由于忽略了输入之间的关系导致一些特征的丢失,而传统的卷积操作难以建立远程信息,而远程信息对于生成全局一致的结果至关重要,对服装纹理细节的限制会导致TPS试装过程中过度变形,使得很多试穿方法在最终的视频渲染中不真实。针对上述问题,采用约束函数对TPS参数进行正则化,减少了变形过程中服装的过度变形;提出了一种多模态两级组合变压器:在第一级增加交互模块,模拟人与服装之间的远距离关系,从而获得更好的远程关系,提高TPS性能;第二阶段,增加激活模块,可以建立全局依赖关系,这种依赖关系可以使数据中的输入重要区域更加突出,为后续的U-net网络提供更自然的中间输入。本文的方法可以为视频虚拟拟合带来更好的效果,在VVT数据集上的实验证明,该方法在定量和定性方面都优于以往的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MMTrans: MultiModal Transformer for realistic video virtual try-on
Video virtual try-on methods aim to generate coherent, smooth, and realistic try-on videos, it matches the target clothing with the person in the video in a spatiotemporally consistent manner. Existing methods can match the human body with the clothing and then present it by the way of video, however it will cause the problem of excessive distortion of the grid and poor display effect at last. Given the problem, we found that was due to the neglect of the relationship between inputs lead to the loss of some features, while the conventional convolution operation is difficult to establish the remote information that is crucial in generating globally consistent results, restriction on clothing texture detail can lead to excessive deformation in the process of TPS fitting, make a lot of the try-on method in the final video rendering is not real. For the above problems, we reduce the excessive distortion of the garment during deformation by using a constraint function to regularize the TPS parameters; it also proposes a multimodal two-stage combinatorial Transformer: in the first stage, an interaction module is added, in which the long-distance relationship between people and clothing can be simulated, and then a better remote relationship can be obtained as well as contribute to the performance of TPS; in the second stage, an activation module is added, which can establish a global dependency, and this dependency can make the input important regions in the data are more prominent, which can provide more natural intermediate inputs for subsequent U-net networks. This paper’s method can bring better results for video virtual fitting, and experiments on the VVT dataset prove that the method outperforms previous methods in both quantitative and qualitative aspects.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信