面向大数据分类的线性原对偶多实例支持向量机

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI:10.1109/ICDM51629.2021.00012

Lodewijk Brand, L. Baker, Carla Ellefsen, Jackson Sargent, Hua Wang

{"title":"面向大数据分类的线性原对偶多实例支持向量机","authors":"Lodewijk Brand, L. Baker, Carla Ellefsen, Jackson Sargent, Hua Wang","doi":"10.1109/ICDM51629.2021.00012","DOIUrl":null,"url":null,"abstract":"Multi-instance learning (MIL) is an area of machine learning that handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting and is able to classify bags which can contain any number of instances. This property allows MIL to be naturally applied to solve the problems in a wide variety of real-world applications from computer vision to healthcare. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper we present a novel Primal-Dual Multi-Instance Support Vector Machine (pdMISVM) derivation and implementation that can operate efficiently on large scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers (ADMM). The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are generally used to optimize MIL algorithms based on SVMs. In addition, we modify our derivation to include an additional optimization designed to avoid solving a least-squares problem during our algorithm; this optimization increases the utility of our approach to handle a large number of features as well as bags. Finally, we apply our approach to synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method. We end our discussion with an extension of our approach to handle non-linear decision boundaries. Code and data for our methods are available online at: https://github.com/minds-mines/pdMISVM.jl.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Linear Primal-Dual Multi-Instance SVM for Big Data Classifications\",\"authors\":\"Lodewijk Brand, L. Baker, Carla Ellefsen, Jackson Sargent, Hua Wang\",\"doi\":\"10.1109/ICDM51629.2021.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-instance learning (MIL) is an area of machine learning that handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting and is able to classify bags which can contain any number of instances. This property allows MIL to be naturally applied to solve the problems in a wide variety of real-world applications from computer vision to healthcare. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper we present a novel Primal-Dual Multi-Instance Support Vector Machine (pdMISVM) derivation and implementation that can operate efficiently on large scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers (ADMM). The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are generally used to optimize MIL algorithms based on SVMs. In addition, we modify our derivation to include an additional optimization designed to avoid solving a least-squares problem during our algorithm; this optimization increases the utility of our approach to handle a large number of features as well as bags. Finally, we apply our approach to synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method. We end our discussion with an extension of our approach to handle non-linear decision boundaries. Code and data for our methods are available online at: https://github.com/minds-mines/pdMISVM.jl.\",\"PeriodicalId\":320970,\"journal\":{\"name\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM51629.2021.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

多实例学习(MIL)是机器学习的一个领域，它处理组织成实例集(称为包)的数据。传统上，MIL用于监督学习设置，并且能够对包含任意数量实例的包进行分类。这一特性使得MIL可以自然地应用于解决从计算机视觉到医疗保健的各种实际应用中的问题。然而，许多传统的MIL算法不能有效地扩展到大型数据集。在本文中，我们提出了一种新的原始-对偶多实例支持向量机(pdMISVM)的推导和实现，它可以有效地处理大规模数据。我们的方法依赖于使用乘法器交替方向法(ADMM)的多块变化衍生的算法。这项工作中提出的方法能够扩展到大规模数据，因为它避免了迭代求解二次规划问题，而二次规划问题通常用于优化基于支持向量机的MIL算法。此外，我们修改了我们的推导，包括一个额外的优化设计，以避免在我们的算法中解决最小二乘问题;这种优化增加了我们处理大量特征和包的方法的实用性。最后，我们将我们的方法应用于合成和现实世界的多实例数据集，以说明我们提出的方法的可扩展性，有希望的预测性能和可解释性。我们通过扩展处理非线性决策边界的方法来结束讨论。我们的方法的代码和数据可以在https://github.com/minds-mines/pdMISVM.jl上在线获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Linear Primal-Dual Multi-Instance SVM for Big Data Classifications

Multi-instance learning (MIL) is an area of machine learning that handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting and is able to classify bags which can contain any number of instances. This property allows MIL to be naturally applied to solve the problems in a wide variety of real-world applications from computer vision to healthcare. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper we present a novel Primal-Dual Multi-Instance Support Vector Machine (pdMISVM) derivation and implementation that can operate efficiently on large scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers (ADMM). The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are generally used to optimize MIL algorithms based on SVMs. In addition, we modify our derivation to include an additional optimization designed to avoid solving a least-squares problem during our algorithm; this optimization increases the utility of our approach to handle a large number of features as well as bags. Finally, we apply our approach to synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method. We end our discussion with an extension of our approach to handle non-linear decision boundaries. Code and data for our methods are available online at: https://github.com/minds-mines/pdMISVM.jl.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量