Victor Borza, Andrew Estornell, Ellen Wright Clayton, Chien-Ju Ho, Russell Rothman, Yevgeniy Vorobeychik, Bradley Malin
{"title":"Adaptive Recruitment Resource Allocation to Improve Cohort Representativeness in Participatory Biomedical Datasets","authors":"Victor Borza, Andrew Estornell, Ellen Wright Clayton, Chien-Ju Ho, Russell Rothman, Yevgeniy Vorobeychik, Bradley Malin","doi":"arxiv-2408.01375","DOIUrl":null,"url":null,"abstract":"Large participatory biomedical studies, studies that recruit individuals to\njoin a dataset, are gaining popularity and investment, especially for analysis\nby modern AI methods. Because they purposively recruit participants, these\nstudies are uniquely able to address a lack of historical representation, an\nissue that has affected many biomedical datasets. In this work, we define\nrepresentativeness as the similarity to a target population distribution of a\nset of attributes and our goal is to mirror the U.S. population across\ndistributions of age, gender, race, and ethnicity. Many participatory studies\nrecruit at several institutions, so we introduce a computational approach to\nadaptively allocate recruitment resources among sites to improve\nrepresentativeness. In simulated recruitment of 10,000-participant cohorts from\nmedical centers in the STAR Clinical Research Network, we show that our\napproach yields a more representative cohort than existing baselines. Thus, we\nhighlight the value of computational modeling in guiding recruitment efforts.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large participatory biomedical studies, studies that recruit individuals to
join a dataset, are gaining popularity and investment, especially for analysis
by modern AI methods. Because they purposively recruit participants, these
studies are uniquely able to address a lack of historical representation, an
issue that has affected many biomedical datasets. In this work, we define
representativeness as the similarity to a target population distribution of a
set of attributes and our goal is to mirror the U.S. population across
distributions of age, gender, race, and ethnicity. Many participatory studies
recruit at several institutions, so we introduce a computational approach to
adaptively allocate recruitment resources among sites to improve
representativeness. In simulated recruitment of 10,000-participant cohorts from
medical centers in the STAR Clinical Research Network, we show that our
approach yields a more representative cohort than existing baselines. Thus, we
highlight the value of computational modeling in guiding recruitment efforts.
大型参与式生物医学研究(即招募个人加入数据集的研究)越来越受欢迎,投资也越来越多,尤其是在使用现代人工智能方法进行分析时。由于这些研究有目的性地招募参与者,因此能独特地解决缺乏历史代表性的问题,而这个问题已经影响到许多生物医学数据集。在这项工作中,我们将代表性定义为一组属性与目标人群分布的相似性,我们的目标是在年龄、性别、种族和民族分布方面反映美国人口。许多参与式研究在多个机构进行招募,因此我们引入了一种计算方法,以适应性地在各研究机构之间分配招募资源,从而提高代表性。在对 STAR 临床研究网络医疗中心的 10,000 名参与者队列进行的模拟招募中,我们发现,与现有的基线相比,我们的方法能产生更具代表性的队列。因此,我们强调了计算建模在指导招募工作中的价值。