标签稀缺性下的增量特征空间学习

ACM Transactions on Knowledge Discovery from Data (TKDD) Pub Date : 2022-06-27 DOI:10.1145/3516368

Shilin Gu, Yuhua Qian, Chenping Hou

{"title":"标签稀缺性下的增量特征空间学习","authors":"Shilin Gu, Yuhua Qian, Chenping Hou","doi":"10.1145/3516368","DOIUrl":null,"url":null,"abstract":"Recently, learning and mining from data streams with incremental feature spaces have attracted extensive attention, where data may dynamically expand over time in both volume and feature dimensions. Existing approaches usually assume that the incoming instances can always receive true labels. However, in many real-world applications, e.g., environment monitoring, acquiring the true labels is costly due to the need of human effort in annotating the data. To tackle this problem, we propose a novel incremental Feature spaces Learning with Label Scarcity (FLLS) algorithm, together with its two variants. When data streams arrive with augmented features, we first leverage the margin-based online active learning to select valuable instances to be labeled and thus build superior predictive models with minimal supervision. After receiving the labels, we combine the online passive-aggressive update rule and margin-maximum principle to jointly update the dynamic classifier in the shared and augmented feature space. Finally, we use the projected truncation technique to build a sparse but efficient model. We theoretically analyze the error bounds of FLLS and its two variants. Also, we conduct experiments on synthetic data and real-world applications to further validate the effectiveness of our proposed algorithms.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Incremental Feature Spaces Learning with Label Scarcity\",\"authors\":\"Shilin Gu, Yuhua Qian, Chenping Hou\",\"doi\":\"10.1145/3516368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, learning and mining from data streams with incremental feature spaces have attracted extensive attention, where data may dynamically expand over time in both volume and feature dimensions. Existing approaches usually assume that the incoming instances can always receive true labels. However, in many real-world applications, e.g., environment monitoring, acquiring the true labels is costly due to the need of human effort in annotating the data. To tackle this problem, we propose a novel incremental Feature spaces Learning with Label Scarcity (FLLS) algorithm, together with its two variants. When data streams arrive with augmented features, we first leverage the margin-based online active learning to select valuable instances to be labeled and thus build superior predictive models with minimal supervision. After receiving the labels, we combine the online passive-aggressive update rule and margin-maximum principle to jointly update the dynamic classifier in the shared and augmented feature space. Finally, we use the projected truncation technique to build a sparse but efficient model. We theoretically analyze the error bounds of FLLS and its two variants. Also, we conduct experiments on synthetic data and real-world applications to further validate the effectiveness of our proposed algorithms.\",\"PeriodicalId\":435653,\"journal\":{\"name\":\"ACM Transactions on Knowledge Discovery from Data (TKDD)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Knowledge Discovery from Data (TKDD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3516368\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3516368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

近年来，从具有增量特征空间的数据流中学习和挖掘引起了广泛的关注，其中数据可能随着时间的推移在体积和特征维度上动态扩展。现有的方法通常假设传入的实例总是能够接收到真实的标签。然而，在许多现实世界的应用程序中，例如，环境监测，由于需要人工注释数据，获取真正的标签是昂贵的。为了解决这个问题，我们提出了一种新的基于标签稀缺性的增量特征空间学习算法(FLLS)及其两个变体。当带有增强特征的数据流到达时，我们首先利用基于边缘的在线主动学习来选择有价值的实例进行标记，从而在最小的监督下构建卓越的预测模型。在接收到标签后，我们结合在线被动攻击更新规则和边际最大化原则，在共享和增强的特征空间中共同更新动态分类器。最后，我们利用投影截断技术建立了一个稀疏但高效的模型。从理论上分析了FLLS及其两种变体的误差范围。此外，我们还对合成数据和实际应用进行了实验，以进一步验证我们提出的算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Incremental Feature Spaces Learning with Label Scarcity

Recently, learning and mining from data streams with incremental feature spaces have attracted extensive attention, where data may dynamically expand over time in both volume and feature dimensions. Existing approaches usually assume that the incoming instances can always receive true labels. However, in many real-world applications, e.g., environment monitoring, acquiring the true labels is costly due to the need of human effort in annotating the data. To tackle this problem, we propose a novel incremental Feature spaces Learning with Label Scarcity (FLLS) algorithm, together with its two variants. When data streams arrive with augmented features, we first leverage the margin-based online active learning to select valuable instances to be labeled and thus build superior predictive models with minimal supervision. After receiving the labels, we combine the online passive-aggressive update rule and margin-maximum principle to jointly update the dynamic classifier in the shared and augmented feature space. Finally, we use the projected truncation technique to build a sparse but efficient model. We theoretically analyze the error bounds of FLLS and its two variants. Also, we conduct experiments on synthetic data and real-world applications to further validate the effectiveness of our proposed algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Knowledge Discovery from Data (TKDD)

自引率

0.00%

发文量