{"title":"基于对象间关系建模的图像描述生成","authors":"Lin Bai, Lina Yang, Lin Huo, Taosheng Li","doi":"10.1109/ICWAPR.2018.8521291","DOIUrl":null,"url":null,"abstract":"Automatically describing the content of an image is a challenging task in computer vision that connects the machine learning and natural language processing. In this paper, we present a framework, based on modeling image context, to generate natural sentences describing an image, which consists of two parts: relation modeling and description generating. By modeling the mapping from image spatial context to the logical relationship between objects, the former is trained to maximize the likelihood of the target linguistics phrase describing the relationship between object given the training image. By taking the the advantages of the syntactic-tree based method, the latter takes the predicted relationships as key ingredients to facilitate the image description generation within tree-growth process. We conduct extensive experimental evaluations on MS COCO dataset. Our framework outperforms the state-of-the-art methods. The results demonstrates that our framework provides robust and significant improvements for the relationship prediction between objects and the image description generation.","PeriodicalId":385478,"journal":{"name":"2018 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image Description Generation by Modeling the Relationship Between Objects\",\"authors\":\"Lin Bai, Lina Yang, Lin Huo, Taosheng Li\",\"doi\":\"10.1109/ICWAPR.2018.8521291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically describing the content of an image is a challenging task in computer vision that connects the machine learning and natural language processing. In this paper, we present a framework, based on modeling image context, to generate natural sentences describing an image, which consists of two parts: relation modeling and description generating. By modeling the mapping from image spatial context to the logical relationship between objects, the former is trained to maximize the likelihood of the target linguistics phrase describing the relationship between object given the training image. By taking the the advantages of the syntactic-tree based method, the latter takes the predicted relationships as key ingredients to facilitate the image description generation within tree-growth process. We conduct extensive experimental evaluations on MS COCO dataset. Our framework outperforms the state-of-the-art methods. The results demonstrates that our framework provides robust and significant improvements for the relationship prediction between objects and the image description generation.\",\"PeriodicalId\":385478,\"journal\":{\"name\":\"2018 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR)\",\"volume\":\"71 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWAPR.2018.8521291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWAPR.2018.8521291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Image Description Generation by Modeling the Relationship Between Objects
Automatically describing the content of an image is a challenging task in computer vision that connects the machine learning and natural language processing. In this paper, we present a framework, based on modeling image context, to generate natural sentences describing an image, which consists of two parts: relation modeling and description generating. By modeling the mapping from image spatial context to the logical relationship between objects, the former is trained to maximize the likelihood of the target linguistics phrase describing the relationship between object given the training image. By taking the the advantages of the syntactic-tree based method, the latter takes the predicted relationships as key ingredients to facilitate the image description generation within tree-growth process. We conduct extensive experimental evaluations on MS COCO dataset. Our framework outperforms the state-of-the-art methods. The results demonstrates that our framework provides robust and significant improvements for the relationship prediction between objects and the image description generation.