{"title":"具有深度学习和自然语言功能的辅助图像注释系统:综述","authors":"Moseli Mots'oehli","doi":"arxiv-2407.00252","DOIUrl":null,"url":null,"abstract":"While supervised learning has achieved significant success in computer vision\ntasks, acquiring high-quality annotated data remains a bottleneck. This paper\nexplores both scholarly and non-scholarly works in AI-assistive deep learning\nimage annotation systems that provide textual suggestions, captions, or\ndescriptions of the input image to the annotator. This potentially results in\nhigher annotation efficiency and quality. Our exploration covers annotation for\na range of computer vision tasks including image classification, object\ndetection, regression, instance, semantic segmentation, and pose estimation. We\nreview various datasets and how they contribute to the training and evaluation\nof AI-assistive annotation systems. We also examine methods leveraging\nneuro-symbolic learning, deep active learning, and self-supervised learning\nalgorithms that enable semantic image understanding and generate free-text\noutput. These include image captioning, visual question answering, and\nmulti-modal reasoning. Despite the promising potential, there is limited\npublicly available work on AI-assistive image annotation with textual output\ncapabilities. We conclude by suggesting future research directions to advance\nthis field, emphasizing the need for more publicly accessible datasets and\ncollaborative efforts between academia and industry.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review\",\"authors\":\"Moseli Mots'oehli\",\"doi\":\"arxiv-2407.00252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While supervised learning has achieved significant success in computer vision\\ntasks, acquiring high-quality annotated data remains a bottleneck. This paper\\nexplores both scholarly and non-scholarly works in AI-assistive deep learning\\nimage annotation systems that provide textual suggestions, captions, or\\ndescriptions of the input image to the annotator. This potentially results in\\nhigher annotation efficiency and quality. Our exploration covers annotation for\\na range of computer vision tasks including image classification, object\\ndetection, regression, instance, semantic segmentation, and pose estimation. We\\nreview various datasets and how they contribute to the training and evaluation\\nof AI-assistive annotation systems. We also examine methods leveraging\\nneuro-symbolic learning, deep active learning, and self-supervised learning\\nalgorithms that enable semantic image understanding and generate free-text\\noutput. These include image captioning, visual question answering, and\\nmulti-modal reasoning. Despite the promising potential, there is limited\\npublicly available work on AI-assistive image annotation with textual output\\ncapabilities. We conclude by suggesting future research directions to advance\\nthis field, emphasizing the need for more publicly accessible datasets and\\ncollaborative efforts between academia and industry.\",\"PeriodicalId\":501168,\"journal\":{\"name\":\"arXiv - CS - Emerging Technologies\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.00252\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.00252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review
While supervised learning has achieved significant success in computer vision
tasks, acquiring high-quality annotated data remains a bottleneck. This paper
explores both scholarly and non-scholarly works in AI-assistive deep learning
image annotation systems that provide textual suggestions, captions, or
descriptions of the input image to the annotator. This potentially results in
higher annotation efficiency and quality. Our exploration covers annotation for
a range of computer vision tasks including image classification, object
detection, regression, instance, semantic segmentation, and pose estimation. We
review various datasets and how they contribute to the training and evaluation
of AI-assistive annotation systems. We also examine methods leveraging
neuro-symbolic learning, deep active learning, and self-supervised learning
algorithms that enable semantic image understanding and generate free-text
output. These include image captioning, visual question answering, and
multi-modal reasoning. Despite the promising potential, there is limited
publicly available work on AI-assistive image annotation with textual output
capabilities. We conclude by suggesting future research directions to advance
this field, emphasizing the need for more publicly accessible datasets and
collaborative efforts between academia and industry.