{"title":"Visualization of Individual Variation of Multiple Annotators Working on Training Datasets for Machine Learning","authors":"T. Itoh, Ayana Murakami","doi":"10.1109/NicoInt50878.2020.00022","DOIUrl":null,"url":null,"abstract":"Quality of training datasets is essential for the quality of machine learning. Machine learning projects often invite multiple workers for these annotation tasks for training dataset creation. It is important to observe on what types of contents multiple workers make different annotations, or which workers often make abnormal annotations, to guarantee the quality of training datasets. This paper presents a tool for the visualization of abnormality of annotations by multiple workers. The tool generates a matrix of abnormality of annotations for each of the images by each of the workers and displays as a heatmap. This paper introduces an example using a training dataset where estimated ages are annotated to 7,748 pictures of human faces by eight workers.","PeriodicalId":230190,"journal":{"name":"2020 Nicograph International (NicoInt)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Nicograph International (NicoInt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NicoInt50878.2020.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Quality of training datasets is essential for the quality of machine learning. Machine learning projects often invite multiple workers for these annotation tasks for training dataset creation. It is important to observe on what types of contents multiple workers make different annotations, or which workers often make abnormal annotations, to guarantee the quality of training datasets. This paper presents a tool for the visualization of abnormality of annotations by multiple workers. The tool generates a matrix of abnormality of annotations for each of the images by each of the workers and displays as a heatmap. This paper introduces an example using a training dataset where estimated ages are annotated to 7,748 pictures of human faces by eight workers.