Krishnan Ramnath, Simon Baker, Lucy Vanderwende, M. El-Saban, Sudipta N. Sinha, A. Kannan, N. Hassan, Michel Galley, Yi Yang, Deva Ramanan, Alessandro Bergamo, L. Torresani
{"title":"AutoCaption: Automatic caption generation for personal photos","authors":"Krishnan Ramnath, Simon Baker, Lucy Vanderwende, M. El-Saban, Sudipta N. Sinha, A. Kannan, N. Hassan, Michel Galley, Yi Yang, Deva Ramanan, Alessandro Bergamo, L. Torresani","doi":"10.1109/WACV.2014.6835988","DOIUrl":null,"url":null,"abstract":"AutoCaption is a system that helps a smartphone user generate a caption for their photos. It operates by uploading the photo to a cloud service where a number of parallel modules are applied to recognize a variety of entities and relations. The outputs of the modules are combined to generate a large set of candidate captions, which are returned to the phone. The phone client includes a convenient user interface that allows users to select their favorite caption, reorder, add, or delete words to obtain the grammatical style they prefer. The user can also select from multiple candidates returned by the recognition modules.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"34 1","pages":"1050-1057"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2014.6835988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
AutoCaption is a system that helps a smartphone user generate a caption for their photos. It operates by uploading the photo to a cloud service where a number of parallel modules are applied to recognize a variety of entities and relations. The outputs of the modules are combined to generate a large set of candidate captions, which are returned to the phone. The phone client includes a convenient user interface that allows users to select their favorite caption, reorder, add, or delete words to obtain the grammatical style they prefer. The user can also select from multiple candidates returned by the recognition modules.