{"title":"基于NCAM图像可及性准则的图像描述自动评价的神经网络模型与框架","authors":"R. Shrestha","doi":"10.1145/3508259.3508269","DOIUrl":null,"url":null,"abstract":"Millions of people who are either blind or visually impaired have difficulty understanding the content in an image. To address the problem textual image descriptions or captions are provided separately or as alternative texts on the web so that the users can read them through a screen reader. However, most of the image descriptions provided are inadequate to make them accessible enough. Image descriptions could be written either manually or automatically generated using software tools. There are tools, methods, and metrics used to evaluate the quality of the generated text. However, almost all of them are word-similarity-based and generic. Even though there are standard guidelines such as WCAG2.0 and NCAM image accessibility guidelines, they are rarely used in the evaluation of image descriptions. In this paper, we propose a neural network-based framework and models for an automatic evaluation of image descriptions in terms of compliance with the NCAM guidelines. A custom dataset was created from a widely used Flickr8K dataset to train and test the models. The experimental results show the proposed framework performing very well with an average accuracy of above 98%. We believe that the framework could be helpful and useful for the authors of image descriptions in writing accessible image descriptions for the users.","PeriodicalId":259099,"journal":{"name":"Proceedings of the 2021 4th Artificial Intelligence and Cloud Computing Conference","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Neural Network Model and Framework for an Automatic Evaluation of Image Descriptions based on NCAM Image Accessibility Guidelines\",\"authors\":\"R. Shrestha\",\"doi\":\"10.1145/3508259.3508269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Millions of people who are either blind or visually impaired have difficulty understanding the content in an image. To address the problem textual image descriptions or captions are provided separately or as alternative texts on the web so that the users can read them through a screen reader. However, most of the image descriptions provided are inadequate to make them accessible enough. Image descriptions could be written either manually or automatically generated using software tools. There are tools, methods, and metrics used to evaluate the quality of the generated text. However, almost all of them are word-similarity-based and generic. Even though there are standard guidelines such as WCAG2.0 and NCAM image accessibility guidelines, they are rarely used in the evaluation of image descriptions. In this paper, we propose a neural network-based framework and models for an automatic evaluation of image descriptions in terms of compliance with the NCAM guidelines. A custom dataset was created from a widely used Flickr8K dataset to train and test the models. The experimental results show the proposed framework performing very well with an average accuracy of above 98%. We believe that the framework could be helpful and useful for the authors of image descriptions in writing accessible image descriptions for the users.\",\"PeriodicalId\":259099,\"journal\":{\"name\":\"Proceedings of the 2021 4th Artificial Intelligence and Cloud Computing Conference\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 4th Artificial Intelligence and Cloud Computing Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508259.3508269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 4th Artificial Intelligence and Cloud Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508259.3508269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Neural Network Model and Framework for an Automatic Evaluation of Image Descriptions based on NCAM Image Accessibility Guidelines
Millions of people who are either blind or visually impaired have difficulty understanding the content in an image. To address the problem textual image descriptions or captions are provided separately or as alternative texts on the web so that the users can read them through a screen reader. However, most of the image descriptions provided are inadequate to make them accessible enough. Image descriptions could be written either manually or automatically generated using software tools. There are tools, methods, and metrics used to evaluate the quality of the generated text. However, almost all of them are word-similarity-based and generic. Even though there are standard guidelines such as WCAG2.0 and NCAM image accessibility guidelines, they are rarely used in the evaluation of image descriptions. In this paper, we propose a neural network-based framework and models for an automatic evaluation of image descriptions in terms of compliance with the NCAM guidelines. A custom dataset was created from a widely used Flickr8K dataset to train and test the models. The experimental results show the proposed framework performing very well with an average accuracy of above 98%. We believe that the framework could be helpful and useful for the authors of image descriptions in writing accessible image descriptions for the users.