Khaled Alhazmi, Walaa Alsumari, Indrek Seppo, L. Podkuiko, Martin Simon
{"title":"标注质量对模型性能的影响","authors":"Khaled Alhazmi, Walaa Alsumari, Indrek Seppo, L. Podkuiko, Martin Simon","doi":"10.1109/ICAIIC51459.2021.9415271","DOIUrl":null,"url":null,"abstract":"Supervised machine learning generally requires pre-labelled data. Although there are several open access and pre-annotated datasets available for training machine learning algorithms, most contain a limited number of object classes, which may not be suitable for specific tasks. As previously available pre-annotated data is not usually sufficient for custom models, most of the real world applications require collecting and preparing training data. There is an obvious trade-off between annotation quality and quantity. Time and resources can be allocated for ensuring superior data quality or for increasing the quantity of the annotated data. We test the degree of the detrimental effect caused by the annotation errors. We conclude that while the results deteriorate if annotations are erroneous; the effect – at least while using relatively homogeneous sequential video data – is limited. The benefits from the increased annotated data set size (created by using imperfect auto-annotation methods) outweighs the deterioration caused by annotated data.","PeriodicalId":432977,"journal":{"name":"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Effects of annotation quality on model performance\",\"authors\":\"Khaled Alhazmi, Walaa Alsumari, Indrek Seppo, L. Podkuiko, Martin Simon\",\"doi\":\"10.1109/ICAIIC51459.2021.9415271\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Supervised machine learning generally requires pre-labelled data. Although there are several open access and pre-annotated datasets available for training machine learning algorithms, most contain a limited number of object classes, which may not be suitable for specific tasks. As previously available pre-annotated data is not usually sufficient for custom models, most of the real world applications require collecting and preparing training data. There is an obvious trade-off between annotation quality and quantity. Time and resources can be allocated for ensuring superior data quality or for increasing the quantity of the annotated data. We test the degree of the detrimental effect caused by the annotation errors. We conclude that while the results deteriorate if annotations are erroneous; the effect – at least while using relatively homogeneous sequential video data – is limited. The benefits from the increased annotated data set size (created by using imperfect auto-annotation methods) outweighs the deterioration caused by annotated data.\",\"PeriodicalId\":432977,\"journal\":{\"name\":\"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIIC51459.2021.9415271\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIIC51459.2021.9415271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Effects of annotation quality on model performance
Supervised machine learning generally requires pre-labelled data. Although there are several open access and pre-annotated datasets available for training machine learning algorithms, most contain a limited number of object classes, which may not be suitable for specific tasks. As previously available pre-annotated data is not usually sufficient for custom models, most of the real world applications require collecting and preparing training data. There is an obvious trade-off between annotation quality and quantity. Time and resources can be allocated for ensuring superior data quality or for increasing the quantity of the annotated data. We test the degree of the detrimental effect caused by the annotation errors. We conclude that while the results deteriorate if annotations are erroneous; the effect – at least while using relatively homogeneous sequential video data – is limited. The benefits from the increased annotated data set size (created by using imperfect auto-annotation methods) outweighs the deterioration caused by annotated data.