{"title":"影响图像字幕模型性能的因素评价","authors":"Duc-Cuong Dao, Thi-Oanh Nguyen, S. Bressan","doi":"10.1145/3007120.3007136","DOIUrl":null,"url":null,"abstract":"Recently, neural network-based methods have shown impressive performances in captioning task. There have been numerous attempts with many proposed architectures to solve this captioning problem. In this paper, we present the evaluation of different alternatives in architecture and optimization algorithms for a neural image captioning model. First, we present the study of a image captioning model that is comprised of two modules -- a convolutional neural network which encodes the input image into a fixed-dimensional feature vector and a recurrent neural network to decode that representation into a sequence of words describing the input image. After that, we consider different alternatives regarding architecture and optimization algorithm to train the model. We conduct a set of experiments on standard benchmark datasets to evaluate different aspects of the captioning system using standard evaluation methods that are utilized in image captioning literatures. Based on the results of those experiments, we propose several suggestions on architecture and optimization algorithm of the image captioning model that is balanced in terms of the performance and the feasibility to be deployed on real-world problems with commodity hardware.","PeriodicalId":394387,"journal":{"name":"Proceedings of the 14th International Conference on Advances in Mobile Computing and Multi Media","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Factors Influencing The Performance of Image Captioning Model: An Evaluation\",\"authors\":\"Duc-Cuong Dao, Thi-Oanh Nguyen, S. Bressan\",\"doi\":\"10.1145/3007120.3007136\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, neural network-based methods have shown impressive performances in captioning task. There have been numerous attempts with many proposed architectures to solve this captioning problem. In this paper, we present the evaluation of different alternatives in architecture and optimization algorithms for a neural image captioning model. First, we present the study of a image captioning model that is comprised of two modules -- a convolutional neural network which encodes the input image into a fixed-dimensional feature vector and a recurrent neural network to decode that representation into a sequence of words describing the input image. After that, we consider different alternatives regarding architecture and optimization algorithm to train the model. We conduct a set of experiments on standard benchmark datasets to evaluate different aspects of the captioning system using standard evaluation methods that are utilized in image captioning literatures. Based on the results of those experiments, we propose several suggestions on architecture and optimization algorithm of the image captioning model that is balanced in terms of the performance and the feasibility to be deployed on real-world problems with commodity hardware.\",\"PeriodicalId\":394387,\"journal\":{\"name\":\"Proceedings of the 14th International Conference on Advances in Mobile Computing and Multi Media\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th International Conference on Advances in Mobile Computing and Multi Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3007120.3007136\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th International Conference on Advances in Mobile Computing and Multi Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3007120.3007136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Factors Influencing The Performance of Image Captioning Model: An Evaluation
Recently, neural network-based methods have shown impressive performances in captioning task. There have been numerous attempts with many proposed architectures to solve this captioning problem. In this paper, we present the evaluation of different alternatives in architecture and optimization algorithms for a neural image captioning model. First, we present the study of a image captioning model that is comprised of two modules -- a convolutional neural network which encodes the input image into a fixed-dimensional feature vector and a recurrent neural network to decode that representation into a sequence of words describing the input image. After that, we consider different alternatives regarding architecture and optimization algorithm to train the model. We conduct a set of experiments on standard benchmark datasets to evaluate different aspects of the captioning system using standard evaluation methods that are utilized in image captioning literatures. Based on the results of those experiments, we propose several suggestions on architecture and optimization algorithm of the image captioning model that is balanced in terms of the performance and the feasibility to be deployed on real-world problems with commodity hardware.