{"title":"Combining vs. Transferring Knowledge: Investigating Strategies for Improving Demographic Inference in Low Resource Settings","authors":"Yaguang Liu, Lisa Singh","doi":"10.1145/3539597.3570462","DOIUrl":null,"url":null,"abstract":"For some learning tasks, generating a large labeled data set is impractical. Demographic inference using social media data is one such task. While different strategies have been proposed to mitigate this challenge, including transfer learning, data augmentation, and data combination, they have not been explored for the task of user level demographic inference using social media data. This paper explores two of these strategies: data combination and transfer learning. First, we combine labeled training data from multiple data sets of similar size to understand when the combination is valuable and when it is not. Using data set distance, we quantify the relationship between our data sets to help explain the performance of the combination strategy. Then, we consider supervised transfer learning, where we pretrain a model on a larger labeled data set, fine-tune the model on smaller data sets, and incorporate regularization as part of the transfer learning process. We empirically show the strengths and limitations of the proposed techniques on multiple Twitter data sets.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539597.3570462","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
For some learning tasks, generating a large labeled data set is impractical. Demographic inference using social media data is one such task. While different strategies have been proposed to mitigate this challenge, including transfer learning, data augmentation, and data combination, they have not been explored for the task of user level demographic inference using social media data. This paper explores two of these strategies: data combination and transfer learning. First, we combine labeled training data from multiple data sets of similar size to understand when the combination is valuable and when it is not. Using data set distance, we quantify the relationship between our data sets to help explain the performance of the combination strategy. Then, we consider supervised transfer learning, where we pretrain a model on a larger labeled data set, fine-tune the model on smaller data sets, and incorporate regularization as part of the transfer learning process. We empirically show the strengths and limitations of the proposed techniques on multiple Twitter data sets.