{"title":"Robust interactive segmentation via coloring","authors":"Ozan Sener, K. Ugur, Aydin Alatan","doi":"10.1145/2304496.2304505","DOIUrl":"https://doi.org/10.1145/2304496.2304505","url":null,"abstract":"User centered and high performance interactive image segmentation is important for ground truth annotation tools as well as many mobile computer graphics and vision applications which requires user to select an object from non-trivial background. On the other hand, These methods should also tolerate interaction errors due to the small sized screens of the mobile devices. We namely propose a new interaction mechanism called coloring. Coloring is completely dynamic interaction methodology and specifically designed for touchscreen devices and mobile applications. Moreover, in order to compensate interaction errors, we have proposed a mechanism to handle errors. Proposed interaction methodology is also tested subjectively. And, superior performance of both interaction methodology and error-correction mechanism is also presented.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127189267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multi-view annotation tool for people detection evaluation","authors":"Á. Utasi, C. Benedek","doi":"10.1145/2304496.2304499","DOIUrl":"https://doi.org/10.1145/2304496.2304499","url":null,"abstract":"In this paper we introduce a novel multi-view annotation tool for generating 3D ground truth data of the real location of people in the scene. The proposed tool allows the user to accurately select the ground occupancy of people by aligning an oriented rectangle on the ground plane. In addition, the height of the people can also be adjusted. In order to achieve precise ground truth data the user is aided by the video frames of multiple synchronized and calibrated cameras. Finally, the 3D annotation data can be easily converted to 2D image positions using the available calibration matrices. One key advantage of the proposed technique is that different methods can be compared against each other, whether they estimate the real world ground position of people or the 2D position on the camera images. Therefore, we defined two different error metrics, which quantitatively evaluate the estimated positions. We used the proposed tool to annotate two publicly available datasets, and evaluated the metrics on two state of the art algorithms.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115265544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiscale annotation of still images with GAT","authors":"Xavier Giró-i-Nieto, Manel Martos","doi":"10.1145/2304496.2304497","DOIUrl":"https://doi.org/10.1145/2304496.2304497","url":null,"abstract":"This paper presents GAT, a Graphical Annotation Tool for still images that works both at the global and local scales. This interface has been designed to assist users in the annotation of images with relation to the semantic classes described in an ontology. Positive, negative and neutral labels can be assigned to both the whole images or parts of them. The user interface is capable of exploiting segmentation data to assist in the selection of objects. Moreover, the annotation capabilities are complemented with additional functionalities that allow the creation and evaluation of an image classifier. The implemented Java source code is published under a free software license.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116226225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Zimmermann, Markus Weber, M. Liwicki, D. Stricker
{"title":"CoVidA: pen-based collaborative video annotation","authors":"Tobias Zimmermann, Markus Weber, M. Liwicki, D. Stricker","doi":"10.1145/2304496.2304506","DOIUrl":"https://doi.org/10.1145/2304496.2304506","url":null,"abstract":"In this paper, we propose a pen-based annotation tool for videos. Annotating videos is an exhausting task, but it has a great benefit for several communities, as labeled ground truth data is the foundation for supervised machine learning approaches. Thus, there is need for an easy-to-use tool which assists users with labeling even complex structures. For outlining and labeling the shape of an object, we introduce a pen-based interface which combines pen and touch input. In our experiments we show that especially for complex structures the usage of a pen device improves the effectiveness of the outlining process.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"51 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126340763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient annotation of traffic video data","authors":"J. M. Mossi, A. Albiol, A. Albiol","doi":"10.1145/2304496.2304503","DOIUrl":"https://doi.org/10.1145/2304496.2304503","url":null,"abstract":"This paper presents a software application to generate ground-truth data on video files from traffic surveillance cameras used for Intelligent Transportation Systems (IT systems). The computer vision system to be evaluated measures the number of vehicles that cross a line per time unit --intensity-, the speed and the occupancy. A typical scenario is a camera on a pole 5 to 12m high pointing to the street in a city or to the lanes in a motorway. The application presented here is a tool to navigate through the video and annotate each instant when a vehicle crosses the target line and other features like its speed. The main target of the visual interface presented in this paper is to be easy to use, and with easy to find and non-specific hardware. It is based on a standard laptop or desktop computer and a Jog shuttle wheel, affordable and very common in Broadcast Video Edition. The setup is efficient and comfortable because one hand of the annotating person is almost all the time on the space key of the keyboard while the other hand is on the jog shuttle wheel. The mean time required to annotate a video file ranges from 1 to 5 times its duration (per lane) depending on the content.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125657402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Kavasidis, S. Palazzo, R. Salvo, D. Giordano, C. Spampinato
{"title":"A semi-automatic tool for detection and tracking ground truth generation in videos","authors":"I. Kavasidis, S. Palazzo, R. Salvo, D. Giordano, C. Spampinato","doi":"10.1145/2304496.2304502","DOIUrl":"https://doi.org/10.1145/2304496.2304502","url":null,"abstract":"In this paper we present a tool for the generation of ground-truth data for object detection, tracking and recognition applications. Compared to state of the art methods, such as ViPER-GT, our tool improves the user experience by providing edit shortcuts such as hotkeys and drag-and-drop, and by integrating computer vision algorithms to automate, under the supervision of the user, the extraction of contours and the identification of objects across frames. A comparison between our application and ViPER-GT tool was performed, which showed how our tool allows users to label a video in a shorter time, while at the same time providing a higher ground truth quality.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"424 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134241030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An annotation tool for dermoscopic image segmentation","authors":"P. Ferreira, T. Mendonça, J. Rozeira, P. Rocha","doi":"10.1145/2304496.2304501","DOIUrl":"https://doi.org/10.1145/2304496.2304501","url":null,"abstract":"Dermoscopy is a non-invasive diagnostic technique for the in vivo observation of pigmented skin lesions, and it is currently one of the most important imaging techniques for melanoma diagnosis.\u0000 Since the diagnostic accuracy of dermoscopy significantly depends on the experience of the dermatologists, and the visual interpretation and examination of this kind of images is time consuming, several computer-aided diagnosis systems of digital dermoscopic images have been introduced.\u0000 However, a reliable ground truth database of manually segmented images is necessary for the development and validation of automatic segmentation and classification methods. As the ground truth database have to be created by expert dermatologists, there is a need for the development of annotation tools that can support the manual segmentation of dermoscopic images, and this way make this task easier and practicable for dermatologists.\u0000 In this paper we present an annotation tool for manual segmentation of dermoscopic images. This tool allows building up a ground truth database with the manual segmentations both of pigmented skin lesions and of other regions of interest, whose recognition is essential for the development of computer-aided diagnosis systems.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126054562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multi-tool for ground-truth stereo correspondence, object outlining and points-of-interest selection","authors":"B. Cyganek, K. Socha","doi":"10.1145/2304496.2304500","DOIUrl":"https://doi.org/10.1145/2304496.2304500","url":null,"abstract":"In this paper we present architecture and functionality of the visual multi-tool designed for acquisition of ground-truth and reference data for computer vision experimentation. The multi-tool allows three main functions, namely manual matching of the corresponding points for multi-view correlation, outlining of the image objects with polygons, as well as selection of characteristic points in specific image areas. These functions allows gathering of experimental data which are used for training and/or verification in such computer vision methods as stereo correlation, road signs detection and recognition, as well as color based segmentation. We present overview of the experimental results which were made possible with this multi-tool, as well as we discuss its potential further applications and extensions. The presented software platform was made available on the Internet.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129466622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An interactive tool for extremely dense landmarking of faces","authors":"B. Barbour, K. Ricanek","doi":"10.1145/2304496.2304509","DOIUrl":"https://doi.org/10.1145/2304496.2304509","url":null,"abstract":"The purpose of this paper is to introduce a tool that provides a GUI for generating a ground truth for landmark positions for 2D images used in computer vision applications. Further, we demonstrate via a case study that this tool greatly improves manual landmarking in the case of extremely dense (more than 250 points per images) annotation of face images with a factor of two speed up. Moreover the tool incorporates workflow technology capable of allowing multiple land-markers to work on the same set of images and for quality assurance checks. We are in the process of making this tool freely available to researchers from academia.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134395533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthetic ground truth dataset to detect shadows cast by static objects in outdoors","authors":"C. Isaza, Joaquin Salas, B. Raducanu","doi":"10.1145/2304496.2304507","DOIUrl":"https://doi.org/10.1145/2304496.2304507","url":null,"abstract":"In this paper, we propose a precise synthetic ground truth dataset to study the problem of detection of the shadows cast by static objects in outdoor environments during extended periods of time (days). For our dataset, we have created a virtual scenario using a rendering software. To increase the realism of the simulated environment, we have defined the scenario in a precise geographical location. In our dataset the sun is by far the main illumination source. The sun position during the simulation time takes into consideration factors related to the geographical location, such as the latitude, longitude, elevation above sea level, and precise image capturing day and time. In our simulation the camera remains fixed. The dataset consists of seven days of simulation, from 10:00am to 5:00pm. Images are captured every 10 seconds. The shadows' ground truth is automatically computed by the rendering software.","PeriodicalId":196376,"journal":{"name":"International Workshop on Video and Image Ground Truth in Computer Vision Applications","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128070998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}