Haipeng Chen, Fuhai Xiong, Dihong Wu, Lingxiang Zheng, Ao Peng, Xuemin Hong, Biyu Tang, Hai Lu, Haibin Shi, Huiru Zheng
{"title":"评估数据量和数据集平衡对使用深度学习方法进行人类活动识别的影响","authors":"Haipeng Chen, Fuhai Xiong, Dihong Wu, Lingxiang Zheng, Ao Peng, Xuemin Hong, Biyu Tang, Hai Lu, Haibin Shi, Huiru Zheng","doi":"10.1109/BIBM.2017.8217821","DOIUrl":null,"url":null,"abstract":"Over the past decade, deep learning developed rapidly and had significant impact on a variety of application domains. It has been applied to the field of human activity recognition to substitute for well-established analysis techniques that rely on handcrafted feature extraction and classification methods in recent years. However, less attentions have been paid to the influence of training data on recognition accuracy. In this paper, we assessed the influence factors of data volume and data balance in human activity recognition when using deep learning approaches. We evaluated the relationship between data volumes of training dataset and predict accuracy of deep learning algorithms. Given the impact of the data balance between activity categories on the recognition accuracy, we modified the SMOTE algorithm so that it can be applied to human activity recognition. Results show that when the data volume is small (<4M), the recognition accuracy increased quickly with the increase of the quantity of training data. However, the growth trend of recognition accuracy slows down when the data quantity reaches 4 million. Further increase the data volume does not significantly improve the activity recognition performance. So we can conclude that 4 million data volume can ensure a sufficient accuracy for human activity recognition. Meanwhile, the data set balance operation can not only improve the recognition accuracy of minority categories, but also helps to increase the overall accuracy.","PeriodicalId":283543,"journal":{"name":"2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Assessing impacts of data volume and data set balance in using deep learning approach to human activity recognition\",\"authors\":\"Haipeng Chen, Fuhai Xiong, Dihong Wu, Lingxiang Zheng, Ao Peng, Xuemin Hong, Biyu Tang, Hai Lu, Haibin Shi, Huiru Zheng\",\"doi\":\"10.1109/BIBM.2017.8217821\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the past decade, deep learning developed rapidly and had significant impact on a variety of application domains. It has been applied to the field of human activity recognition to substitute for well-established analysis techniques that rely on handcrafted feature extraction and classification methods in recent years. However, less attentions have been paid to the influence of training data on recognition accuracy. In this paper, we assessed the influence factors of data volume and data balance in human activity recognition when using deep learning approaches. We evaluated the relationship between data volumes of training dataset and predict accuracy of deep learning algorithms. Given the impact of the data balance between activity categories on the recognition accuracy, we modified the SMOTE algorithm so that it can be applied to human activity recognition. Results show that when the data volume is small (<4M), the recognition accuracy increased quickly with the increase of the quantity of training data. However, the growth trend of recognition accuracy slows down when the data quantity reaches 4 million. Further increase the data volume does not significantly improve the activity recognition performance. So we can conclude that 4 million data volume can ensure a sufficient accuracy for human activity recognition. Meanwhile, the data set balance operation can not only improve the recognition accuracy of minority categories, but also helps to increase the overall accuracy.\",\"PeriodicalId\":283543,\"journal\":{\"name\":\"2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2017.8217821\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2017.8217821","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Assessing impacts of data volume and data set balance in using deep learning approach to human activity recognition
Over the past decade, deep learning developed rapidly and had significant impact on a variety of application domains. It has been applied to the field of human activity recognition to substitute for well-established analysis techniques that rely on handcrafted feature extraction and classification methods in recent years. However, less attentions have been paid to the influence of training data on recognition accuracy. In this paper, we assessed the influence factors of data volume and data balance in human activity recognition when using deep learning approaches. We evaluated the relationship between data volumes of training dataset and predict accuracy of deep learning algorithms. Given the impact of the data balance between activity categories on the recognition accuracy, we modified the SMOTE algorithm so that it can be applied to human activity recognition. Results show that when the data volume is small (<4M), the recognition accuracy increased quickly with the increase of the quantity of training data. However, the growth trend of recognition accuracy slows down when the data quantity reaches 4 million. Further increase the data volume does not significantly improve the activity recognition performance. So we can conclude that 4 million data volume can ensure a sufficient accuracy for human activity recognition. Meanwhile, the data set balance operation can not only improve the recognition accuracy of minority categories, but also helps to increase the overall accuracy.