在多尺度时空特征提取网络中使用分割增强帧预测

2020 International Conference on Pervasive Artificial Intelligence (ICPAI) Pub Date : 2020-12-01 DOI:10.1109/ICPAI51961.2020.00038

Michael Mu-Chien Hsu, Richard Jui-Chun Shyur

{"title":"在多尺度时空特征提取网络中使用分割增强帧预测","authors":"Michael Mu-Chien Hsu, Richard Jui-Chun Shyur","doi":"10.1109/ICPAI51961.2020.00038","DOIUrl":null,"url":null,"abstract":"Designing a machine to predict future events is a challenging problem to even existing state-of-the-art approaches. It require great computation power either in adversarial training and in segmentation and optical flow. By combining conventional segmentation and the DNN we proposed in this paper, we have a simpler architecture which effectively and efficiently predicts both future frames and semantics more precise than the previous approaches. The input is a raw image sequence, and each frame of it is segmented for semantics, extracted for spatial features, analyzed for temporal features at different scales in a top down path; and then the prediction of frames and segmentation are synthesized in the bottom-up path. Results of our model show superiority of prediction to other state-of-the-art ones in (1) precision of frames, and (2) accuracy of segmentation masks.","PeriodicalId":330198,"journal":{"name":"2020 International Conference on Pervasive Artificial Intelligence (ICPAI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Segmentation to Enhance Frame Prediction in a Multi-Scale Spatial-Temporal Feature Extraction Network\",\"authors\":\"Michael Mu-Chien Hsu, Richard Jui-Chun Shyur\",\"doi\":\"10.1109/ICPAI51961.2020.00038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designing a machine to predict future events is a challenging problem to even existing state-of-the-art approaches. It require great computation power either in adversarial training and in segmentation and optical flow. By combining conventional segmentation and the DNN we proposed in this paper, we have a simpler architecture which effectively and efficiently predicts both future frames and semantics more precise than the previous approaches. The input is a raw image sequence, and each frame of it is segmented for semantics, extracted for spatial features, analyzed for temporal features at different scales in a top down path; and then the prediction of frames and segmentation are synthesized in the bottom-up path. Results of our model show superiority of prediction to other state-of-the-art ones in (1) precision of frames, and (2) accuracy of segmentation masks.\",\"PeriodicalId\":330198,\"journal\":{\"name\":\"2020 International Conference on Pervasive Artificial Intelligence (ICPAI)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Pervasive Artificial Intelligence (ICPAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPAI51961.2020.00038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Pervasive Artificial Intelligence (ICPAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPAI51961.2020.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

设计一台能够预测未来事件的机器，即使对现有最先进的方法来说，也是一个具有挑战性的问题。无论是对抗性训练还是分割和光流，都需要巨大的计算能力。通过将传统分割和本文提出的深度神经网络相结合，我们有了一个更简单的架构，可以有效地预测未来的帧和语义，比以前的方法更精确。输入为原始图像序列，对其每帧进行语义分割、空间特征提取、不同尺度时间特征分析;然后在自底向上的路径上综合帧的预测和分割。结果表明，我们的模型在(1)帧的精度和(2)分割掩码的精度方面优于其他最先进的预测方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using Segmentation to Enhance Frame Prediction in a Multi-Scale Spatial-Temporal Feature Extraction Network

Designing a machine to predict future events is a challenging problem to even existing state-of-the-art approaches. It require great computation power either in adversarial training and in segmentation and optical flow. By combining conventional segmentation and the DNN we proposed in this paper, we have a simpler architecture which effectively and efficiently predicts both future frames and semantics more precise than the previous approaches. The input is a raw image sequence, and each frame of it is segmented for semantics, extracted for spatial features, analyzed for temporal features at different scales in a top down path; and then the prediction of frames and segmentation are synthesized in the bottom-up path. Results of our model show superiority of prediction to other state-of-the-art ones in (1) precision of frames, and (2) accuracy of segmentation masks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International Conference on Pervasive Artificial Intelligence (ICPAI)

自引率

0.00%

发文量