PARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA.

IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY
Yu Gu, John S Preisser, Donglin Zeng, Poojan Shrestha, Molina Shah, Miguel A Simancas-Pallares, Jeannie Ginnis, Kimon Divaris
{"title":"PARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA.","authors":"Yu Gu,&nbsp;John S Preisser,&nbsp;Donglin Zeng,&nbsp;Poojan Shrestha,&nbsp;Molina Shah,&nbsp;Miguel A Simancas-Pallares,&nbsp;Jeannie Ginnis,&nbsp;Kimon Divaris","doi":"10.1214/21-aoas1516","DOIUrl":null,"url":null,"abstract":"<p><p>Community water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals' fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. The context is a state-wide epidemiologic study of pediatric oral health in North Carolina, where domestic water fluoride concentration information was missing for approximately 75% of study participants with clinical data on dental caries. A new machine-learning-based imputation method that combines partitioning around medoids clustering and random forest classification (PAMRF) is developed and implemented. Imputed values are filtered according to allowable error rates or target sample size, depending on the requirements of each application. In leave-one-out cross-validation and simulation studies, PAMRF outperforms four existing imputation approaches-two conventional spatial interpolation methods (i.e., inverse-distance weighting, IDW and universal kriging, UK) and two supervised learning methods (<i>k</i>-nearest neighbors, KNN and classification and regression trees, CART). The inclusion of multiply imputed values in the estimation of the association between fluoride concentration and dental caries prevalence resulted in essentially no change in PAMRF estimates but substantial gains in precision due to larger effective sample size. PAMRF is a powerful new method for the imputation of missing fluoride values where geographical information exists.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"16 1","pages":"551-572"},"PeriodicalIF":1.3000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963777/pdf/nihms-1731052.pdf","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/21-aoas1516","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 2

Abstract

Community water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals' fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. The context is a state-wide epidemiologic study of pediatric oral health in North Carolina, where domestic water fluoride concentration information was missing for approximately 75% of study participants with clinical data on dental caries. A new machine-learning-based imputation method that combines partitioning around medoids clustering and random forest classification (PAMRF) is developed and implemented. Imputed values are filtered according to allowable error rates or target sample size, depending on the requirements of each application. In leave-one-out cross-validation and simulation studies, PAMRF outperforms four existing imputation approaches-two conventional spatial interpolation methods (i.e., inverse-distance weighting, IDW and universal kriging, UK) and two supervised learning methods (k-nearest neighbors, KNN and classification and regression trees, CART). The inclusion of multiply imputed values in the estimation of the association between fluoride concentration and dental caries prevalence resulted in essentially no change in PAMRF estimates but substantial gains in precision due to larger effective sample size. PAMRF is a powerful new method for the imputation of missing fluoride values where geographical information exists.

Abstract Image

Abstract Image

基于地理信息系统的氟化物浓度数据的聚类和随机森林分类划分。
社区饮水加氟是促进口腔健康的一个重要组成部分,因为氟化物暴露是一种有充分证据的龋齿预防剂。对生活用水氟化物含量的直接测量提供了有关个人接触氟化物的宝贵信息,从而提供了龋齿风险;然而,在口腔健康研究中进行大规模的后勤挑战。本文介绍了一种基于空间自相关信息的生活用水氟化物浓度缺失数据补全新方法的开发与评价。本研究的背景是北卡罗来纳州一项全州范围的儿童口腔健康流行病学研究,其中约75%的研究参与者缺少有关龋齿临床数据的家庭用水氟化物浓度信息。提出并实现了一种基于机器学习的围绕介质聚类和随机森林分类相结合的插值方法。根据每个应用程序的要求,根据允许错误率或目标样本量对输入值进行过滤。在留一交叉验证和仿真研究中,PAMRF优于四种现有的插值方法,即两种传统的空间插值方法(即逆距离加权,IDW和通用克里格,UK)和两种监督学习方法(k-近邻,KNN和分类与回归树,CART)。在估计氟化物浓度与龋齿患病率之间的关系时纳入多个估算值,导致PAMRF估计值基本上没有变化,但由于有效样本量的增加,精度大大提高。PAMRF是一种强大的新方法,用于在存在地理信息的情况下计算缺失的氟化物值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Applied Statistics
Annals of Applied Statistics 社会科学-统计学与概率论
CiteScore
3.10
自引率
5.60%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信