{"title":"一种融合分层图像信息的增强图像字幕模型","authors":"Nathan Funckes, Erin Carrier, Greg Wolffe","doi":"10.1109/ICMLA52953.2021.00257","DOIUrl":null,"url":null,"abstract":"Despite published accessibility standards many websites remain nan-compliant, containing images lacking accompanying textual descriptions. This leaves visually-impaired individuals unable to fully enjoy the rich wonders of the web. To help address this inequity, our research seeks to improve the ability of autonomous systems to generate accurate, relevant image descriptions. Our model enhances training efficacy by incorporating the use of category labels, high-level object superclasses, which are derivable using modern object-detection models. We show that this simple augmentation to an existing architecture results in a statistically significant improvement in caption quality.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"34 1","pages":"1608-1614"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Augmented Image Captioning Model: Incorporating Hierarchical Image Information\",\"authors\":\"Nathan Funckes, Erin Carrier, Greg Wolffe\",\"doi\":\"10.1109/ICMLA52953.2021.00257\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite published accessibility standards many websites remain nan-compliant, containing images lacking accompanying textual descriptions. This leaves visually-impaired individuals unable to fully enjoy the rich wonders of the web. To help address this inequity, our research seeks to improve the ability of autonomous systems to generate accurate, relevant image descriptions. Our model enhances training efficacy by incorporating the use of category labels, high-level object superclasses, which are derivable using modern object-detection models. We show that this simple augmentation to an existing architecture results in a statistically significant improvement in caption quality.\",\"PeriodicalId\":6750,\"journal\":{\"name\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"34 1\",\"pages\":\"1608-1614\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA52953.2021.00257\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Augmented Image Captioning Model: Incorporating Hierarchical Image Information
Despite published accessibility standards many websites remain nan-compliant, containing images lacking accompanying textual descriptions. This leaves visually-impaired individuals unable to fully enjoy the rich wonders of the web. To help address this inequity, our research seeks to improve the ability of autonomous systems to generate accurate, relevant image descriptions. Our model enhances training efficacy by incorporating the use of category labels, high-level object superclasses, which are derivable using modern object-detection models. We show that this simple augmentation to an existing architecture results in a statistically significant improvement in caption quality.