{"title":"一种新的延迟标签机器学习方法","authors":"Haoran Gao, Zhijun Ding","doi":"10.1109/ICNSC55942.2022.10004167","DOIUrl":null,"url":null,"abstract":"Most research on machine learning relies on the availability of ground truth labels immediately after prediction. However, in many cases, the ground truth labels become available with a non-negligible delay. Considering that there is a large amount of unlabeled data in delayed labels, supervised model cannot utilize unlabeled data. Therefore, most of the research on delayed labels begins to train semi-supervised models in delayed labels. However, most research on delayed labels ignores that the labels of unlabeled data will arrive after several periods in delayed labels. Neither supervised nor semi-supervised models can solve the problem in delayed labels effectively. Besides, there remains a problem of concept drift due to the long period of data. In this paper, we propose an incremental learning model that can adapt to delayed labels. First, we should detect whether the concept drift takes place. Then we use knowledge distillation to update supervised and semi-supervised models while retaining the corresponding knowledge of past labeled data. Finally, we combine the supervised and semi-supervised models to make predictions. Finally, we apply our algorithms to synthetic and real credit scoring datasets. The experiment results indicate our algorithms have superiority in delayed labels.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Novel Machine Learning Method for Delayed Labels\",\"authors\":\"Haoran Gao, Zhijun Ding\",\"doi\":\"10.1109/ICNSC55942.2022.10004167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most research on machine learning relies on the availability of ground truth labels immediately after prediction. However, in many cases, the ground truth labels become available with a non-negligible delay. Considering that there is a large amount of unlabeled data in delayed labels, supervised model cannot utilize unlabeled data. Therefore, most of the research on delayed labels begins to train semi-supervised models in delayed labels. However, most research on delayed labels ignores that the labels of unlabeled data will arrive after several periods in delayed labels. Neither supervised nor semi-supervised models can solve the problem in delayed labels effectively. Besides, there remains a problem of concept drift due to the long period of data. In this paper, we propose an incremental learning model that can adapt to delayed labels. First, we should detect whether the concept drift takes place. Then we use knowledge distillation to update supervised and semi-supervised models while retaining the corresponding knowledge of past labeled data. Finally, we combine the supervised and semi-supervised models to make predictions. Finally, we apply our algorithms to synthetic and real credit scoring datasets. The experiment results indicate our algorithms have superiority in delayed labels.\",\"PeriodicalId\":230499,\"journal\":{\"name\":\"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNSC55942.2022.10004167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNSC55942.2022.10004167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Machine Learning Method for Delayed Labels
Most research on machine learning relies on the availability of ground truth labels immediately after prediction. However, in many cases, the ground truth labels become available with a non-negligible delay. Considering that there is a large amount of unlabeled data in delayed labels, supervised model cannot utilize unlabeled data. Therefore, most of the research on delayed labels begins to train semi-supervised models in delayed labels. However, most research on delayed labels ignores that the labels of unlabeled data will arrive after several periods in delayed labels. Neither supervised nor semi-supervised models can solve the problem in delayed labels effectively. Besides, there remains a problem of concept drift due to the long period of data. In this paper, we propose an incremental learning model that can adapt to delayed labels. First, we should detect whether the concept drift takes place. Then we use knowledge distillation to update supervised and semi-supervised models while retaining the corresponding knowledge of past labeled data. Finally, we combine the supervised and semi-supervised models to make predictions. Finally, we apply our algorithms to synthetic and real credit scoring datasets. The experiment results indicate our algorithms have superiority in delayed labels.