Virtual Reality Intelligent Hardware最新文献

筛选
英文 中文
Pre-training transformer with dual-branch context content module for table detection in document images 采用双分支上下文内容模块的预训练变换器,用于文档图像中的表格检测
Virtual Reality Intelligent Hardware Pub Date : 2024-10-01 DOI: 10.1016/j.vrih.2024.06.003
Yongzhi Li , Pengle Zhang , Meng Sun , Jin Huang , Ruhan He
{"title":"Pre-training transformer with dual-branch context content module for table detection in document images","authors":"Yongzhi Li ,&nbsp;Pengle Zhang ,&nbsp;Meng Sun ,&nbsp;Jin Huang ,&nbsp;Ruhan He","doi":"10.1016/j.vrih.2024.06.003","DOIUrl":"10.1016/j.vrih.2024.06.003","url":null,"abstract":"<div><h3>Background</h3><div>Document images such as statistical reports and scientific journals are widely used in information technology. Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction. However, because of the diversity in the shapes and sizes of tables, existing table detection methods adapted from general object detection algorithms, have not yet achieved satisfactory results. Incorrect detection results might lead to the loss of critical information.</div></div><div><h3>Methods</h3><div>Therefore, we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections. To better deal with table areas of different shapes and sizes, we added a dual-branch context content attention module (DCCAM) to high-dimensional features to extract context content information, thereby enhancing the network's ability to learn shape features. For feature fusion at different scales, we replaced the original 3×3 convolution with a multilayer residual module, which contains enhanced gradient flow information to improve the feature representation and extraction capability.</div></div><div><h3>Results</h3><div>We evaluated our method on public document datasets and compared it with previous methods, which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score. <span><span>https://github.com/YongZ-Lee/TD-DCCAM</span><svg><path></path></svg></span></div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 5","pages":"Pages 408-420"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-salient object detection with iterative purification and predictive optimization 通过迭代净化和预测优化进行共轴物体检测
Virtual Reality Intelligent Hardware Pub Date : 2024-10-01 DOI: 10.1016/j.vrih.2024.06.002
Yang Wen, Yuhuan Wang, Hao Wang, Wuzhen Shi, Wenming Cao
{"title":"Co-salient object detection with iterative purification and predictive optimization","authors":"Yang Wen,&nbsp;Yuhuan Wang,&nbsp;Hao Wang,&nbsp;Wuzhen Shi,&nbsp;Wenming Cao","doi":"10.1016/j.vrih.2024.06.002","DOIUrl":"10.1016/j.vrih.2024.06.002","url":null,"abstract":"<div><h3>Background</h3><div>Co-salient object detection (Co-SOD) aims to identify and segment commonly salient objects in a set of related images. However, most current Co-SOD methods encounter issues with the inclusion of irrelevant information in the co-representation. These issues hamper their ability to locate co-salient objects and significantly restrict the accuracy of detection.</div></div><div><h3>Methods</h3><div>To address this issue, this study introduces a novel Co-SOD method with iterative purification and predictive optimization (IPPO) comprising a common salient purification module (CSPM), predictive optimizing module (POM), and diminishing mixed enhancement block (DMEB).</div></div><div><h3>Results</h3><div>These components are designed to explore noise-free joint representations, assist the model in enhancing the quality of the final prediction results, and significantly improve the performance of the Co-SOD algorithm. Furthermore, through a comprehensive evaluation of IPPO and state-of-the-art algorithms focusing on the roles of CSPM, POM, and DMEB, our experiments confirmed that these components are pivotal in enhancing the performance of the model, substantiating the significant advancements of our method over existing benchmarks. Experiments on several challenging benchmark co-saliency datasets demonstrate that the proposed IPPO achieves state-of-the-art performance.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 5","pages":"Pages 396-407"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Music-stylized hierarchical dance synthesis with user control 用户控制的音乐风格化分层舞蹈合成
Virtual Reality Intelligent Hardware Pub Date : 2024-10-01 DOI: 10.1016/j.vrih.2024.06.004
Yanbo Cheng, Yichen Jiang, Yingying Wang
{"title":"Music-stylized hierarchical dance synthesis with user control","authors":"Yanbo Cheng,&nbsp;Yichen Jiang,&nbsp;Yingying Wang","doi":"10.1016/j.vrih.2024.06.004","DOIUrl":"10.1016/j.vrih.2024.06.004","url":null,"abstract":"<div><h3>Background</h3><div>Synthesizing dance motions to match musical inputs is a significant challenge in animation research. Compared to functional human motions, such as locomotion, dance motions are creative and artistic, often influenced by music, and can be independent body language expressions. Dance choreography requires motion content to follow a general dance genre, whereas dance performances under musical influence are infused with diverse impromptu motion styles. Considering the high expressiveness and variations in space and time, providing accessible and effective user control for tuning dance motion styles remains an open problem.</div></div><div><h3>Methods</h3><div>In this study, we present a hierarchical framework that decouples the dance synthesis task into independent modules. We use a high-level choreography module built as a Transformer-based sequence model to predict the long-term structure of a dance genre and a low-level realization module that implements dance stylization and synchronization to match the musical input or user preferences. This novel framework allows the individual modules to be trained separately. Because of the decoupling, dance composition can fully utilize existing high-quality dance datasets that do not have musical accompaniments, and the dance implementation can conveniently incorporate user controls and edit motions through a decoder network. Each module is replaceable at runtime, which adds flexibility to the synthesis of dance sequences.</div></div><div><h3>Results</h3><div>Synthesized results demonstrate that our framework generates high-quality diverse dance motions that are well adapted to varying musical conditions and user controls.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 5","pages":"Pages 339-357"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mesh representation matters: investigating the influence of different mesh features on perceptual and spatial fidelity of deep 3D morphable models 网格表示很重要:研究不同网格特征对深度三维可变形模型的感知和空间保真度的影响
Virtual Reality Intelligent Hardware Pub Date : 2024-10-01 DOI: 10.1016/j.vrih.2024.08.006
Robert KOSK , Richard SOUTHERN , Lihua YOU , Shaojun BIAN , Willem KOKKE , Greg MAGUIRE
{"title":"Mesh representation matters: investigating the influence of different mesh features on perceptual and spatial fidelity of deep 3D morphable models","authors":"Robert KOSK ,&nbsp;Richard SOUTHERN ,&nbsp;Lihua YOU ,&nbsp;Shaojun BIAN ,&nbsp;Willem KOKKE ,&nbsp;Greg MAGUIRE","doi":"10.1016/j.vrih.2024.08.006","DOIUrl":"10.1016/j.vrih.2024.08.006","url":null,"abstract":"<div><h3>Background</h3><div>Deep 3D morphable models (deep 3DMMs) play an essential role in computer vision. They are used in facial synthesis, compression, reconstruction and animation, avatar creation, virtual try-on, facial recognition systems and medical imaging. These applications require high spatial and perceptual quality of synthesised meshes. Despite their significance, these models have not been compared with different mesh representations and evaluated jointly with point-wise distance and perceptual metrics.</div></div><div><h3>Methods</h3><div>We compare the influence of different mesh representation features to various deep 3DMMs on spatial and perceptual fidelity of the reconstructed meshes. This paper proves the hypothesis that building deep 3DMMs from meshes represented with global representations leads to lower spatial reconstruction error measured with <span><math><mrow><msub><mi>L</mi><mn>1</mn></msub></mrow></math></span> and <span><math><mrow><msub><mi>L</mi><mn>2</mn></msub></mrow></math></span> norm metrics and underperforms on perceptual metrics. In contrast, using differential mesh representations which describe differential surface properties yields lower perceptual FMPD and DAME and higher spatial fidelity error. The influence of mesh feature normalisation and standardisation is also compared and analysed from perceptual and spatial fidelity perspectives.</div></div><div><h3>Results</h3><div>The results presented in this paper provide guidance in selecting mesh representations to build deep 3DMMs accordingly to spatial and perceptual quality objectives and propose combinations of mesh representations and deep 3DMMs which improve either perceptual or spatial fidelity of existing methods.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 5","pages":"Pages 383-395"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CURDIS: A template for incremental curve discretization algorithms and its application to conics CURDIS:增量曲线离散化算法模板及其在圆锥曲线中的应用
Virtual Reality Intelligent Hardware Pub Date : 2024-10-01 DOI: 10.1016/j.vrih.2024.06.005
Philippe Latour, Marc Van Droogenbroeck
{"title":"CURDIS: A template for incremental curve discretization algorithms and its application to conics","authors":"Philippe Latour,&nbsp;Marc Van Droogenbroeck","doi":"10.1016/j.vrih.2024.06.005","DOIUrl":"10.1016/j.vrih.2024.06.005","url":null,"abstract":"<div><div>We introduce CURDIS, a template for algorithms to discretize arcs of regular curves by incrementally producing a list of support pixels covering the arc. In this template, algorithms proceed by finding the tangent quadrant at each point of the arc and determining which side the curve exits the pixel according to a tailored criterion. These two elements can be adapted for any type of curve, leading to algorithms dedicated to the shape of specific curves. While the calculation of the tangent quadrant for various curves, such as lines, conics, or cubics, is simple, it is more complex to analyze how pixels are traversed by the curve. In the case of conic arcs, we found a criterion for determining the pixel exit side. This leads us to present a new algorithm, called CURDIS-C, specific to the discretization of conics, for which we provide all the details. Surprisingly, the criterion for conics requires between one and three sign tests and four additions per pixel, making the algorithm efficient for resource-constrained systems and feasible for fixed-point or integer arithmetic implementations. Our algorithm also perfectly handles the pathological cases in which the conic intersects a pixel twice or changes quadrants multiple times within this pixel, achieving this generality at the cost of potentially computing up to two square roots per arc. We illustrate the use of CURDIS for the discretization of different curves, such as ellipses, hyperbolas, and parabolas, even when they degenerate into lines or corners.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 5","pages":"Pages 358-382"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering MKEAH: 基于超平面嵌入的多模态知识提取和积累,用于基于知识的视觉问题解答
Virtual Reality Intelligent Hardware Pub Date : 2024-08-01 DOI: 10.1016/j.vrih.2023.06.002
Heng Zhang , Zhihua Wei , Guanming Liu , Rui Wang , Ruibin Mu , Chuanbao Liu , Aiquan Yuan , Guodong Cao , Ning Hu
{"title":"MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering","authors":"Heng Zhang ,&nbsp;Zhihua Wei ,&nbsp;Guanming Liu ,&nbsp;Rui Wang ,&nbsp;Ruibin Mu ,&nbsp;Chuanbao Liu ,&nbsp;Aiquan Yuan ,&nbsp;Guodong Cao ,&nbsp;Ning Hu","doi":"10.1016/j.vrih.2023.06.002","DOIUrl":"10.1016/j.vrih.2023.06.002","url":null,"abstract":"<div><h3>Background</h3><p>External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world. Recent entity-relationship embedding approaches are deficient in representing some complex relations, resulting in a lack of topic-related knowledge and redundancy in topic-irrelevant information.</p></div><div><h3>Methods</h3><p>To this end, we propose MKEAH: Multimodal Knowledge Extraction and Accumulation on Hyperplanes. To ensure that the lengths of the feature vectors projected onto the hyperplane compare equally and to filter out sufficient topic-irrelevant information, two losses are proposed to learn the triplet representations from the complementary views: range loss and orthogonal loss. To interpret the capability of extracting topic-related knowledge, we present the Topic Similarity (TS) between topic and entity-relations.</p></div><div><h3>Results</h3><p>Experimental results demonstrate the effectiveness of hyperplane embedding for knowledge representation in knowledge-based visual question answering. Our model outperformed state-of-the-art methods by 2.12% and 3.24% on two challenging knowledge-request datasets: OK-VQA and KRVQA, respectively.</p></div><div><h3>Conclusions</h3><p>The obvious advantages of our model in TS show that using hyperplane embedding to represent multimodal knowledge can improve its ability to extract topic-related knowledge.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 280-291"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000268/pdfft?md5=74ea90656cf281de7a0e35aa5b55705b&pid=1-s2.0-S2096579623000268-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale context-aware network for continuous sign language recognition 用于连续手语识别的多尺度情境感知网络
Virtual Reality Intelligent Hardware Pub Date : 2024-08-01 DOI: 10.1016/j.vrih.2023.06.011
Senhua XUE, Liqing GAO, Liang WAN, Wei FENG
{"title":"Multi-scale context-aware network for continuous sign language recognition","authors":"Senhua XUE,&nbsp;Liqing GAO,&nbsp;Liang WAN,&nbsp;Wei FENG","doi":"10.1016/j.vrih.2023.06.011","DOIUrl":"10.1016/j.vrih.2023.06.011","url":null,"abstract":"<div><p>The hands and face are the most important parts for expressing sign language morphemes in sign language videos. However, we find that existing Continuous Sign Language Recognition (CSLR) methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information. In addition, the signs have different lengths, whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling, which disturbs the perception of complete signs. In this study, we propose a Multi-Scale Context-Aware network (MSCA-Net) to solve the aforementioned problems. Our MSCA-Net contains two main modules: <strong>(</strong>1) Multi-Scale Motion Attention (MSMA), which uses the differences among frames to perceive information of the hands and face in multiple spatial scales, replacing the heavy feature extractors; and <strong>(</strong>2) Multi-Scale Temporal Modeling (MSTM), which explores crucial temporal information in the sign language video from different temporal scales. We conduct extensive experiments using three widely used sign language datasets, i.e., RWTH-PHOENIX-Weather-2014, RWTH-PHOENIX-Weather-2014T, and CSL-Daily. The proposed MSCA-Net achieve state-of-the-art performance, demonstrating the effectiveness of our approach.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 323-337"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000414/pdfft?md5=d9cac344d105f6ddc495c1cb1e50a67a&pid=1-s2.0-S2096579623000414-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust blind image watermarking based on interest points 基于兴趣点的鲁棒盲图像水印技术
Virtual Reality Intelligent Hardware Pub Date : 2024-08-01 DOI: 10.1016/j.vrih.2023.06.012
Zizhuo WANG, Kun HU, Chaoyangfan HUANG, Zixuan HU, Shuo YANG, Xingjun WANG
{"title":"Robust blind image watermarking based on interest points","authors":"Zizhuo WANG,&nbsp;Kun HU,&nbsp;Chaoyangfan HUANG,&nbsp;Zixuan HU,&nbsp;Shuo YANG,&nbsp;Xingjun WANG","doi":"10.1016/j.vrih.2023.06.012","DOIUrl":"10.1016/j.vrih.2023.06.012","url":null,"abstract":"<div><p>Digital watermarking technology plays an essential role in the work of anti-counterfeiting and traceability. However, image watermarking algorithms are weak against hybrid attacks, especially geometric at-tacks, such as cropping attacks, rotation attacks, etc. We propose a robust blind image watermarking algorithm that combines stable interest points and deep learning networks to improve the robustness of the watermarking algorithm further. First, to extract more sparse and stable interest points, we use the Superpoint algorithm for generation and design two steps to perform the screening procedure. We first keep the points with the highest possibility in a given region to ensure the sparsity of the points and then filter the robust interest points by hybrid attacks to ensure high stability. The message is embedded in sub-blocks centered on stable interest points using a deep learning-based framework. Different kinds of attacks and simulated noise are added to the adversarial training to guarantee the robustness of embedded blocks. We use the ConvNext network for watermark extraction and determine the division threshold based on the decoded values of the unembedded sub-blocks. Through extensive experimental results, we demonstrate that our proposed algorithm can improve the accuracy of the network in extracting information while ensuring high invisibility between the embedded image and the original cover image. Comparison with previous SOTA work reveals that our algorithm can achieve better visual and numerical results on hybrid and geometric attacks.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 308-322"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000426/pdfft?md5=0d46d851b07db92670b0c63431ec427e&pid=1-s2.0-S2096579623000426-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S2ANet: Combining local spectral and spatial point grouping for point cloud processing S2ANet:结合局部光谱和空间点分组进行点云处理
Virtual Reality Intelligent Hardware Pub Date : 2024-08-01 DOI: 10.1016/j.vrih.2023.06.005
Yujie LIU, Xiaorui SUN, Wenbin SHAO, Yafu YUAN
{"title":"S2ANet: Combining local spectral and spatial point grouping for point cloud processing","authors":"Yujie LIU,&nbsp;Xiaorui SUN,&nbsp;Wenbin SHAO,&nbsp;Yafu YUAN","doi":"10.1016/j.vrih.2023.06.005","DOIUrl":"10.1016/j.vrih.2023.06.005","url":null,"abstract":"<div><h3>Background</h3><p>Despite the recent progress in 3D point cloud processing using deep convolutional neural networks, the inability to extract local features remains a challenging problem. In addition, existing methods consider only the spatial domain in the feature extraction process.</p></div><div><h3>Methods</h3><p>In this paper, we propose a spectral and spatial aggregation convolutional network (S<sup>2</sup>ANet), which combines spectral and spatial features for point cloud processing. First, we calculate the local frequency of the point cloud in the spectral domain. Then, we use the local frequency to group points and provide a spectral aggregation convolution module to extract the features of the points grouped by the local frequency. We simultaneously extract the local features in the spatial domain to supplement the final features.</p></div><div><h3>Results</h3><p>S<sup>2</sup>ANet was applied in several point cloud analysis tasks; it achieved state-of-the-art classification accuracies of 93.8%, 88.0%, and 83.1% on the ModelNet40, ShapeNetCore, and ScanObjectNN datasets, respectively. For indoor scene segmentation, training and testing were performed on the S3DIS dataset, and the mean intersection over union was 62.4%.</p></div><div><h3>Conclusions</h3><p>The proposed S<sup>2</sup>ANet can effectively capture the local geometric information of point clouds, thereby improving accuracy on various tasks.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 267-279"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000360/pdfft?md5=718a7d943dc6468abf44b38521bcc2cb&pid=1-s2.0-S2096579623000360-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating animatable 3D cartoon faces from single portraits 从单个肖像生成可动画化的 3D 卡通人脸
Virtual Reality Intelligent Hardware Pub Date : 2024-08-01 DOI: 10.1016/j.vrih.2023.06.010
Chuanyu PAN , Guowei YANG , Taijiang MU , Yu-Kun LAI
{"title":"Generating animatable 3D cartoon faces from single portraits","authors":"Chuanyu PAN ,&nbsp;Guowei YANG ,&nbsp;Taijiang MU ,&nbsp;Yu-Kun LAI","doi":"10.1016/j.vrih.2023.06.010","DOIUrl":"10.1016/j.vrih.2023.06.010","url":null,"abstract":"<div><h3>Background</h3><p>With the development of virtual reality (VR) technology, there is a growing need for customized 3D avatars. However, traditional methods for 3D avatar modeling are either time-consuming or fail to retain the similarity to the person being modeled. This study presents a novel framework for generating animatable 3D cartoon faces from a single portrait image.</p></div><div><h3>Methods</h3><p>First, we transferred an input real-world portrait to a stylized cartoon image using StyleGAN. We then proposed a two-stage reconstruction method to recover a 3D cartoon face with detailed texture. Our two-stage strategy initially performs coarse estimation based on template models and subsequently refines the model by nonrigid deformation under landmark supervision. Finally, we proposed a semantic-preserving face-rigging method based on manually created templates and deformation transfer.</p></div><div><h3>Conclusions</h3><p>Compared with prior arts, the qualitative and quantitative results show that our method achieves better accuracy, aesthetics, and similarity criteria. Furthermore, we demonstrated the capability of the proposed 3D model for real-time facial animation.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 292-307"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000359/pdfft?md5=e0641053e4314662ffe5dca1c167d86b&pid=1-s2.0-S2096579623000359-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信