利用定点学习方法实现视频的二维到三维转换

2016 9th International Conference on Electrical and Computer Engineering (ICECE) Pub Date : 2016-12-01 DOI:10.1109/ICECE.2016.7853844

Nidhi Chahal, S. Chaudhury

{"title":"利用定点学习方法实现视频的二维到三维转换","authors":"Nidhi Chahal, S. Chaudhury","doi":"10.1109/ICECE.2016.7853844","DOIUrl":null,"url":null,"abstract":"The depth cues from multiple images are useful in accurate depth extraction while monocular cues from single still image are more versatile. In our paper, monocular cue which gives useful information about single frame and depth from motion using optical flow estimated from consecutive video frames are used to produce final depth maps. The machine learning approach is promising and new research direction in the field of depth estimation and thus 2-D to 3-D conversion. A fast automatic technique is proposed which utilizes a fixed point learning framework for the accurate estimation of depth maps of test images. For this task, a contextual prediction function is generated using training database of 2-D color and ground truth depth images. The depth maps obtained from monocular and motion depth cues of input video frames are used as input features for learning process. The depths generated from fixed point model are more accurate and reliable than MRF fusion of these depth cues. The stereo pairs are generated using depth maps predicted from fixed point learning. These final stereo pairs are converted to 3-D output video which is displayed on 3-DTV. For subjective evaluation, MOS score is calculated by showing final 3-D video to different viewers using 3-D glasses.","PeriodicalId":122930,"journal":{"name":"2016 9th International Conference on Electrical and Computer Engineering (ICECE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"2-D to 3-D conversion of videos using fixed point learning approach\",\"authors\":\"Nidhi Chahal, S. Chaudhury\",\"doi\":\"10.1109/ICECE.2016.7853844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The depth cues from multiple images are useful in accurate depth extraction while monocular cues from single still image are more versatile. In our paper, monocular cue which gives useful information about single frame and depth from motion using optical flow estimated from consecutive video frames are used to produce final depth maps. The machine learning approach is promising and new research direction in the field of depth estimation and thus 2-D to 3-D conversion. A fast automatic technique is proposed which utilizes a fixed point learning framework for the accurate estimation of depth maps of test images. For this task, a contextual prediction function is generated using training database of 2-D color and ground truth depth images. The depth maps obtained from monocular and motion depth cues of input video frames are used as input features for learning process. The depths generated from fixed point model are more accurate and reliable than MRF fusion of these depth cues. The stereo pairs are generated using depth maps predicted from fixed point learning. These final stereo pairs are converted to 3-D output video which is displayed on 3-DTV. For subjective evaluation, MOS score is calculated by showing final 3-D video to different viewers using 3-D glasses.\",\"PeriodicalId\":122930,\"journal\":{\"name\":\"2016 9th International Conference on Electrical and Computer Engineering (ICECE)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 9th International Conference on Electrical and Computer Engineering (ICECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECE.2016.7853844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 9th International Conference on Electrical and Computer Engineering (ICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECE.2016.7853844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

来自多幅图像的深度线索有助于准确提取深度，而来自单幅静止图像的单目线索则更为通用。在我们的论文中，单眼线索可以提供关于单帧和运动深度的有用信息，使用从连续视频帧中估计的光流来生成最终的深度图。机器学习方法是深度估计和二维到三维转换领域的一个有前途的新研究方向。提出了一种利用不动点学习框架准确估计测试图像深度图的快速自动技术。在此任务中，使用二维彩色和地面真值深度图像的训练数据库生成上下文预测函数。将输入视频帧的单目深度图和运动深度线索作为学习过程的输入特征。不动点模型生成的深度比这些深度线索的MRF融合更准确和可靠。立体对是使用从不动点学习预测的深度图生成的。这些最终的立体声对被转换成3-D输出视频，在3-DTV上显示。主观评价方面，通过将最终的3-D视频给不同的观众戴上3-D眼镜来计算MOS分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

2-D to 3-D conversion of videos using fixed point learning approach

The depth cues from multiple images are useful in accurate depth extraction while monocular cues from single still image are more versatile. In our paper, monocular cue which gives useful information about single frame and depth from motion using optical flow estimated from consecutive video frames are used to produce final depth maps. The machine learning approach is promising and new research direction in the field of depth estimation and thus 2-D to 3-D conversion. A fast automatic technique is proposed which utilizes a fixed point learning framework for the accurate estimation of depth maps of test images. For this task, a contextual prediction function is generated using training database of 2-D color and ground truth depth images. The depth maps obtained from monocular and motion depth cues of input video frames are used as input features for learning process. The depths generated from fixed point model are more accurate and reliable than MRF fusion of these depth cues. The stereo pairs are generated using depth maps predicted from fixed point learning. These final stereo pairs are converted to 3-D output video which is displayed on 3-DTV. For subjective evaluation, MOS score is calculated by showing final 3-D video to different viewers using 3-D glasses.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 9th International Conference on Electrical and Computer Engineering (ICECE)

自引率

0.00%

发文量