{"title":"Grouped Learning: Group-By Model Selection Workloads","authors":"Side Li","doi":"10.1145/3448016.3450576","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) is gaining popularity in many applications. Increasingly, companies prefer more targeted models for different subgroups of the population like locations, which helps improve accuracy. This practice is comparable to Group-By aggregation in SQL; we call it learning over groups. A smaller group means the data distribution is more straightforward than the whole population. So, a group-level model may offer more accuracy in many cases. Non-technical business needs, such as privacy and regulatory compliance, may also necessitate group-level models. For instance, online advertising platforms would need to build disaggregated partner-specific ML models, where all partner groups' training data are aggregated together in one data pipeline.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3448016.3450576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine Learning (ML) is gaining popularity in many applications. Increasingly, companies prefer more targeted models for different subgroups of the population like locations, which helps improve accuracy. This practice is comparable to Group-By aggregation in SQL; we call it learning over groups. A smaller group means the data distribution is more straightforward than the whole population. So, a group-level model may offer more accuracy in many cases. Non-technical business needs, such as privacy and regulatory compliance, may also necessitate group-level models. For instance, online advertising platforms would need to build disaggregated partner-specific ML models, where all partner groups' training data are aggregated together in one data pipeline.