{"title":"On Data Summarization for Machine Learning in Multi-organization Federations","authors":"Bongjun Ko, Shiqiang Wang, T. He, D. Conway-Jones","doi":"10.1109/SMARTCOMP.2019.00030","DOIUrl":null,"url":null,"abstract":"Machine learning is a promising technology for many modern applications. To train an effective machine learning model, a large amount of data is required. However, data may be created in different organizations and sharing data across organizational boundaries is difficult due to privacy concerns and communication bandwidth limitations. Data summarization is a technique for reducing the amount of data that needs to be shared, while preserving characteristics in the data that are useful for training machine learning models. In this paper, we present an overview of data summarization techniques, which can be useful for machine learning across organizational boundaries. We also discuss some possible applications related to these data summarization techniques and challenges for future research.","PeriodicalId":253364,"journal":{"name":"2019 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP.2019.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Machine learning is a promising technology for many modern applications. To train an effective machine learning model, a large amount of data is required. However, data may be created in different organizations and sharing data across organizational boundaries is difficult due to privacy concerns and communication bandwidth limitations. Data summarization is a technique for reducing the amount of data that needs to be shared, while preserving characteristics in the data that are useful for training machine learning models. In this paper, we present an overview of data summarization techniques, which can be useful for machine learning across organizational boundaries. We also discuss some possible applications related to these data summarization techniques and challenges for future research.