A Simple-to-Use R Package for Mimicking Study Data by Simulations.

IF 1.3 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Giorgos Koliopanos, Francisco Ojeda, Andreas Ziegler
{"title":"A Simple-to-Use R Package for Mimicking Study Data by Simulations.","authors":"Giorgos Koliopanos,&nbsp;Francisco Ojeda,&nbsp;Andreas Ziegler","doi":"10.1055/a-2048-7692","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.</p><p><strong>Objectives: </strong>The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.</p><p><strong>Methods: </strong>The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.</p><p><strong>Results: </strong>modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.</p><p><strong>Conclusion: </strong>The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"119-129"},"PeriodicalIF":1.3000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/75/40/10-1055-a-2048-7692.PMC10462429.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2048-7692","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.

Objectives: The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.

Methods: The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.

Results: modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.

Conclusion: The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.

一个简单易用的R包,用于模拟研究数据。
背景:数据保护政策可能会禁止将现有研究数据转移到感兴趣的研究小组。为了克服法律限制,可以传输模拟结构但与现有研究数据不同的模拟数据。目的:这项工作的目的是介绍简单易用的R包模拟数据生成(modgo),可用于模拟现有研究数据中的连续、有序分类和二分类变量的数据。方法:将秩反正态变换与各变量的相关矩阵的计算相结合。然后,可以从多元正态态模拟数据,并将其转移回变量的原始尺度。modgo的独特之处在于它允许改变变量之间的相关性,执行扰动分析,处理多中心数据,并通过选择一个或一组变量的特定值来改变纳入/排除标准。对实际数据的仿真研究表明了该模型的有效性和灵活性。结果:modgo模拟了原始研究数据的结构。modgo的结果与其他两个现有软件包在标准模拟场景中的结果相似。Modgo的灵活性在几个扩展中得到了证明。结论:当现有研究数据不能共享时,R包模式是有用的。它的扰动扩展允许模拟真正匿名的对象。扩展到多中心研究可用于验证预测模型。额外的扩展甚至可以在大型研究数据中支持关联的解开,并且在功率计算中很有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods of Information in Medicine
Methods of Information in Medicine 医学-计算机:信息系统
CiteScore
3.70
自引率
11.80%
发文量
33
审稿时长
6-12 weeks
期刊介绍: Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信