{"title":"抽样不一定能提高检测器的性能:一种收集训练样本的研究","authors":"Jun Liu, Shuang Lai","doi":"10.1145/3507548.3507568","DOIUrl":null,"url":null,"abstract":"In recent years, the research of computer vision is popular. However, the image data that can be used for computer vision training is very limited, so it is necessary to find an effective method to expand the datasets based on the existing image data. In this paper, we study methods to collect more training data from existing datasets and compare detectors’ performance trained with datasets generated by different methods. One method is to perform sampling-based on statistical properties of feature descriptors. For every feature, the underlying assumption is that a probability density function (PDF) exists, such PDF is approximated with existing training examples and new training examples are sampled from the approximated PDF. The other method is simply to expand the existing datasets by flipping each training example along its symmetric axis. Locally Adaptive Regression Kernel (LARK) feature is used in this paper because it is robust against illumination changes and noise. Our experimental results demonstrate that an expanded training dataset is not always preferable, even if the expanded dataset includes all original training data.","PeriodicalId":414908,"journal":{"name":"Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence","volume":"45 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sampling May Not Always Increase Detector Performance: A Study on Collecting Training Examples\",\"authors\":\"Jun Liu, Shuang Lai\",\"doi\":\"10.1145/3507548.3507568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the research of computer vision is popular. However, the image data that can be used for computer vision training is very limited, so it is necessary to find an effective method to expand the datasets based on the existing image data. In this paper, we study methods to collect more training data from existing datasets and compare detectors’ performance trained with datasets generated by different methods. One method is to perform sampling-based on statistical properties of feature descriptors. For every feature, the underlying assumption is that a probability density function (PDF) exists, such PDF is approximated with existing training examples and new training examples are sampled from the approximated PDF. The other method is simply to expand the existing datasets by flipping each training example along its symmetric axis. Locally Adaptive Regression Kernel (LARK) feature is used in this paper because it is robust against illumination changes and noise. Our experimental results demonstrate that an expanded training dataset is not always preferable, even if the expanded dataset includes all original training data.\",\"PeriodicalId\":414908,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence\",\"volume\":\"45 5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3507548.3507568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3507548.3507568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sampling May Not Always Increase Detector Performance: A Study on Collecting Training Examples
In recent years, the research of computer vision is popular. However, the image data that can be used for computer vision training is very limited, so it is necessary to find an effective method to expand the datasets based on the existing image data. In this paper, we study methods to collect more training data from existing datasets and compare detectors’ performance trained with datasets generated by different methods. One method is to perform sampling-based on statistical properties of feature descriptors. For every feature, the underlying assumption is that a probability density function (PDF) exists, such PDF is approximated with existing training examples and new training examples are sampled from the approximated PDF. The other method is simply to expand the existing datasets by flipping each training example along its symmetric axis. Locally Adaptive Regression Kernel (LARK) feature is used in this paper because it is robust against illumination changes and noise. Our experimental results demonstrate that an expanded training dataset is not always preferable, even if the expanded dataset includes all original training data.