OTrecod: An R Package for Data Fusion using Optimal Transportation Theory

R J. Pub Date : 2023-02-10 DOI:10.32614/rj-2023-006
G. Guernec, Valérie Garès, J. Omer, Philippe Saint-Pierre, N. Savy
{"title":"OTrecod: An R Package for Data Fusion using Optimal Transportation Theory","authors":"G. Guernec, Valérie Garès, J. Omer, Philippe Saint-Pierre, N. Savy","doi":"10.32614/rj-2023-006","DOIUrl":null,"url":null,"abstract":"The advances of information technologies often confront users with a large amount of data which is essential to integrate easily. In this context, creating a single database from multiple separate data sources can appear as an attractive but complex issue when same information of interest is stored in at least two distinct encodings. In this situation, merging the data sources consists in finding a common recoding scale to fill the incomplete information in a synthetic database. The OTrecod package provides R-users two functions dedicated to solve this recoding problem using optimal transportation theory. Specific arguments of these functions enrich the algorithms by relaxing distributional constraints or adding a regularization term to make the data fusion more flexible. The OTrecod package also provides a set of support functions dedicated to the harmonization of separate data sources, the handling of incomplete information and the selection of matching variables. This paper gives all the keys to quickly understand and master the original algorithms implemented in the OTrecod package, assisting step by step the user in its data fusion project.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"43 1","pages":"195-222"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"R J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32614/rj-2023-006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The advances of information technologies often confront users with a large amount of data which is essential to integrate easily. In this context, creating a single database from multiple separate data sources can appear as an attractive but complex issue when same information of interest is stored in at least two distinct encodings. In this situation, merging the data sources consists in finding a common recoding scale to fill the incomplete information in a synthetic database. The OTrecod package provides R-users two functions dedicated to solve this recoding problem using optimal transportation theory. Specific arguments of these functions enrich the algorithms by relaxing distributional constraints or adding a regularization term to make the data fusion more flexible. The OTrecod package also provides a set of support functions dedicated to the harmonization of separate data sources, the handling of incomplete information and the selection of matching variables. This paper gives all the keys to quickly understand and master the original algorithms implemented in the OTrecod package, assisting step by step the user in its data fusion project.
基于最优传输理论的数据融合R包
随着信息技术的发展,用户经常需要面对大量的数据,而这些数据对于易于集成至关重要。在这种情况下,当感兴趣的相同信息以至少两种不同的编码存储时,从多个独立的数据源创建单个数据库似乎是一个吸引人但复杂的问题。在这种情况下,合并数据源包括找到一个通用的重新编码尺度来填充合成数据库中的不完整信息。OTrecod包为r用户提供了两个函数,专门用于使用最优传输理论解决这个重新编码问题。这些函数的具体参数通过放宽分布约束或添加正则化项来丰富算法,使数据融合更加灵活。OTrecod包还提供了一组专门用于协调独立数据源、处理不完整信息和选择匹配变量的支持函数。本文给出了快速理解和掌握OTrecod包中实现的原始算法的所有关键,帮助用户逐步完成其数据融合项目。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信