Itamar Rocha Filho, Felipe Honorato, J. W. Lucena, J. P. Teixeira, T. Araújo
{"title":"一种面向盲人的汉字自动描述方法","authors":"Itamar Rocha Filho, Felipe Honorato, J. W. Lucena, J. P. Teixeira, T. Araújo","doi":"10.1145/3470482.3479617","DOIUrl":null,"url":null,"abstract":"Audio Description (AD) or Video Description is a vital accessibility concept in blind and visually impaired people's life. Automating this task is not easy and involves many problems, such as describing the scenario, actions, emotions, and characters. This paper presents an approach to automatically describe characters --- in a video or image --- combining Deep Learning (DL), Face detection, Facial Expression detection techniques, and audio synthesizers. Our proposal uses the detection tools, applies some DL models to the analyzed data, and generates an audio description. To evaluate the feasibility of our proposal, we have developed a proof of concept of the solution and performed some computational experiments to evaluate it.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Approach for Automatic Description of Characters for Blind People\",\"authors\":\"Itamar Rocha Filho, Felipe Honorato, J. W. Lucena, J. P. Teixeira, T. Araújo\",\"doi\":\"10.1145/3470482.3479617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Audio Description (AD) or Video Description is a vital accessibility concept in blind and visually impaired people's life. Automating this task is not easy and involves many problems, such as describing the scenario, actions, emotions, and characters. This paper presents an approach to automatically describe characters --- in a video or image --- combining Deep Learning (DL), Face detection, Facial Expression detection techniques, and audio synthesizers. Our proposal uses the detection tools, applies some DL models to the analyzed data, and generates an audio description. To evaluate the feasibility of our proposal, we have developed a proof of concept of the solution and performed some computational experiments to evaluate it.\",\"PeriodicalId\":350776,\"journal\":{\"name\":\"Proceedings of the Brazilian Symposium on Multimedia and the Web\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Brazilian Symposium on Multimedia and the Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3470482.3479617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Brazilian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3470482.3479617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Approach for Automatic Description of Characters for Blind People
Audio Description (AD) or Video Description is a vital accessibility concept in blind and visually impaired people's life. Automating this task is not easy and involves many problems, such as describing the scenario, actions, emotions, and characters. This paper presents an approach to automatically describe characters --- in a video or image --- combining Deep Learning (DL), Face detection, Facial Expression detection techniques, and audio synthesizers. Our proposal uses the detection tools, applies some DL models to the analyzed data, and generates an audio description. To evaluate the feasibility of our proposal, we have developed a proof of concept of the solution and performed some computational experiments to evaluate it.