通过bestNormalize找到最优的规范化转换

R J. Pub Date : 2021-01-01 DOI:10.32614/rj-2021-041

Ryan A. Peterson

{"title":"通过bestNormalize找到最优的规范化转换","authors":"Ryan A. Peterson","doi":"10.32614/rj-2021-041","DOIUrl":null,"url":null,"abstract":"The bestNormalize R package was designed to help users find a transformation that can effectively normalize a vector regardless of its actual distribution. Each of the many normalization techniques that have been developed has its own strengths and weaknesses, and deciding which to use until data are fully observed is difficult or impossible. This package facilitates choosing between a range of possible transformations and will automatically return the best one, i.e., the one that makes data look the most normal. To evaluate and compare the normalization efficacy across a suite of possible transformations, we developed a statistic based on a goodness of fit test divided by its degrees of freedom. Transformations can be seamlessly trained and applied to newly observed data, and can be implemented in conjunction with caret and recipes for data preprocessing in machine learning workflows. Custom transformations and normalization statistics are supported.","PeriodicalId":20974,"journal":{"name":"R J.","volume":"19 1","pages":"310"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"111","resultStr":"{\"title\":\"Finding Optimal Normalizing Transformations via bestNormalize\",\"authors\":\"Ryan A. Peterson\",\"doi\":\"10.32614/rj-2021-041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The bestNormalize R package was designed to help users find a transformation that can effectively normalize a vector regardless of its actual distribution. Each of the many normalization techniques that have been developed has its own strengths and weaknesses, and deciding which to use until data are fully observed is difficult or impossible. This package facilitates choosing between a range of possible transformations and will automatically return the best one, i.e., the one that makes data look the most normal. To evaluate and compare the normalization efficacy across a suite of possible transformations, we developed a statistic based on a goodness of fit test divided by its degrees of freedom. Transformations can be seamlessly trained and applied to newly observed data, and can be implemented in conjunction with caret and recipes for data preprocessing in machine learning workflows. Custom transformations and normalization statistics are supported.\",\"PeriodicalId\":20974,\"journal\":{\"name\":\"R J.\",\"volume\":\"19 1\",\"pages\":\"310\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"111\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"R J.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32614/rj-2021-041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"R J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32614/rj-2021-041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 111

摘要

bestNormalize R包的设计目的是帮助用户找到一个变换，它可以有效地规范化一个向量，而不管它的实际分布如何。已经开发的许多标准化技术中的每一种都有自己的优点和缺点，在完全观察到数据之前决定使用哪一种是困难的或不可能的。这个包有助于在一系列可能的转换之间进行选择，并将自动返回最佳转换，即使数据看起来最正常的转换。为了评估和比较一组可能转换的归一化效果，我们开发了一个基于拟合优度检验除以其自由度的统计量。转换可以无缝地训练并应用于新观察到的数据，并且可以与机器学习工作流程中的数据预处理的插入符号和配方一起实现。支持自定义转换和规范化统计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Finding Optimal Normalizing Transformations via bestNormalize

The bestNormalize R package was designed to help users find a transformation that can effectively normalize a vector regardless of its actual distribution. Each of the many normalization techniques that have been developed has its own strengths and weaknesses, and deciding which to use until data are fully observed is difficult or impossible. This package facilitates choosing between a range of possible transformations and will automatically return the best one, i.e., the one that makes data look the most normal. To evaluate and compare the normalization efficacy across a suite of possible transformations, we developed a statistic based on a goodness of fit test divided by its degrees of freedom. Transformations can be seamlessly trained and applied to newly observed data, and can be implemented in conjunction with caret and recipes for data preprocessing in machine learning workflows. Custom transformations and normalization statistics are supported.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

R J.

自引率

0.00%

发文量