{"title":"面向可及性和数据解释的图表图像数据提取与问答","authors":"Shahira K C;Pulkit Joshi;Lijiya A","doi":"10.1109/OJCS.2023.3328767","DOIUrl":null,"url":null,"abstract":"Graphical representations such as chart images are integral to web pages and documents. Automating data extraction from charts is possible by reverse-engineering the visualization pipeline. This study proposes a framework that automates data extraction from bar charts and integrates it with question-answering. The framework employs an object detector to recognize visual cues in the image, followed by text recognition. Mask-RCNN for plot element detection achieves a mean average precision of 95.04% at a threshold of 0.5 which decreases as the Intersection over Union (IoU) threshold increases. A contour approximation-based approach is proposed for extracting the bar coordinates, even at a higher IoU of 0.9. The textual and visual cues are associated with the legend text and preview, and the chart data is finally extracted in tabular format. We introduce an extension to the TAPAS model, called TAPAS++, by incorporating new operations and table question answering is done using TAPAS++ model. The chart summary or description is also produced in an audio format. In the future, this approach could be expanded to enable interactive question answering on charts by accepting audio inquiries from individuals with visual impairments and do more complex reasoning using Large Language Models.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"4 ","pages":"314-325"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10302417","citationCount":"0","resultStr":"{\"title\":\"Data Extraction and Question Answering on Chart Images Towards Accessibility and Data Interpretation\",\"authors\":\"Shahira K C;Pulkit Joshi;Lijiya A\",\"doi\":\"10.1109/OJCS.2023.3328767\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphical representations such as chart images are integral to web pages and documents. Automating data extraction from charts is possible by reverse-engineering the visualization pipeline. This study proposes a framework that automates data extraction from bar charts and integrates it with question-answering. The framework employs an object detector to recognize visual cues in the image, followed by text recognition. Mask-RCNN for plot element detection achieves a mean average precision of 95.04% at a threshold of 0.5 which decreases as the Intersection over Union (IoU) threshold increases. A contour approximation-based approach is proposed for extracting the bar coordinates, even at a higher IoU of 0.9. The textual and visual cues are associated with the legend text and preview, and the chart data is finally extracted in tabular format. We introduce an extension to the TAPAS model, called TAPAS++, by incorporating new operations and table question answering is done using TAPAS++ model. The chart summary or description is also produced in an audio format. In the future, this approach could be expanded to enable interactive question answering on charts by accepting audio inquiries from individuals with visual impairments and do more complex reasoning using Large Language Models.\",\"PeriodicalId\":13205,\"journal\":{\"name\":\"IEEE Open Journal of the Computer Society\",\"volume\":\"4 \",\"pages\":\"314-325\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10302417\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10302417/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10302417/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data Extraction and Question Answering on Chart Images Towards Accessibility and Data Interpretation
Graphical representations such as chart images are integral to web pages and documents. Automating data extraction from charts is possible by reverse-engineering the visualization pipeline. This study proposes a framework that automates data extraction from bar charts and integrates it with question-answering. The framework employs an object detector to recognize visual cues in the image, followed by text recognition. Mask-RCNN for plot element detection achieves a mean average precision of 95.04% at a threshold of 0.5 which decreases as the Intersection over Union (IoU) threshold increases. A contour approximation-based approach is proposed for extracting the bar coordinates, even at a higher IoU of 0.9. The textual and visual cues are associated with the legend text and preview, and the chart data is finally extracted in tabular format. We introduce an extension to the TAPAS model, called TAPAS++, by incorporating new operations and table question answering is done using TAPAS++ model. The chart summary or description is also produced in an audio format. In the future, this approach could be expanded to enable interactive question answering on charts by accepting audio inquiries from individuals with visual impairments and do more complex reasoning using Large Language Models.