{"title":"Using Segmentation to Enhance Frame Prediction in a Multi-Scale Spatial-Temporal Feature Extraction Network","authors":"Michael Mu-Chien Hsu, Richard Jui-Chun Shyur","doi":"10.1109/ICPAI51961.2020.00038","DOIUrl":null,"url":null,"abstract":"Designing a machine to predict future events is a challenging problem to even existing state-of-the-art approaches. It require great computation power either in adversarial training and in segmentation and optical flow. By combining conventional segmentation and the DNN we proposed in this paper, we have a simpler architecture which effectively and efficiently predicts both future frames and semantics more precise than the previous approaches. The input is a raw image sequence, and each frame of it is segmented for semantics, extracted for spatial features, analyzed for temporal features at different scales in a top down path; and then the prediction of frames and segmentation are synthesized in the bottom-up path. Results of our model show superiority of prediction to other state-of-the-art ones in (1) precision of frames, and (2) accuracy of segmentation masks.","PeriodicalId":330198,"journal":{"name":"2020 International Conference on Pervasive Artificial Intelligence (ICPAI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Pervasive Artificial Intelligence (ICPAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPAI51961.2020.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Designing a machine to predict future events is a challenging problem to even existing state-of-the-art approaches. It require great computation power either in adversarial training and in segmentation and optical flow. By combining conventional segmentation and the DNN we proposed in this paper, we have a simpler architecture which effectively and efficiently predicts both future frames and semantics more precise than the previous approaches. The input is a raw image sequence, and each frame of it is segmented for semantics, extracted for spatial features, analyzed for temporal features at different scales in a top down path; and then the prediction of frames and segmentation are synthesized in the bottom-up path. Results of our model show superiority of prediction to other state-of-the-art ones in (1) precision of frames, and (2) accuracy of segmentation masks.