Noise simulation in classification with the noisemodel R package: Applications analyzing the impact of errors with chemical data

IF 2.3 4区 化学 Q1 SOCIAL WORK
José A. Sáez
{"title":"Noise simulation in classification with the noisemodel R package: Applications analyzing the impact of errors with chemical data","authors":"José A. Sáez","doi":"10.1002/cem.3472","DOIUrl":null,"url":null,"abstract":"<p>Classification datasets created from chemical processes can be affected by errors, which impair the accuracy of the models built. This fact highlights the importance of analyzing the robustness of classifiers against different types and levels of noise to know their behavior against potential errors. In this context, noise models have been proposed to study noise-related phenomenology in a controlled environment, allowing errors to be introduced into the data in a supervised manner. This paper introduces the <i>noisemodel</i> R package, which contains the first extensive implementation of noise models for classification datasets, proposing it as support tool to analyze the impact of errors related to chemical data. It provides 72 noise models found in the specialized literature that allow errors to be introduced in different ways in classes and attributes. Each of them is properly documented and referenced, unifying their results through a specific S3 class, which benefits from customized print, summary and plot methods. The usage of the package is illustrated through four application examples considering real-world chemical datasets, where errors are prone to occur. The software presented will help to deepen the understanding of the problem of noisy chemical data, as well as to develop new robust algorithms and noise preprocessing methods properly adapted to different types of errors in this scenario.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3472","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3472","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 2

Abstract

Classification datasets created from chemical processes can be affected by errors, which impair the accuracy of the models built. This fact highlights the importance of analyzing the robustness of classifiers against different types and levels of noise to know their behavior against potential errors. In this context, noise models have been proposed to study noise-related phenomenology in a controlled environment, allowing errors to be introduced into the data in a supervised manner. This paper introduces the noisemodel R package, which contains the first extensive implementation of noise models for classification datasets, proposing it as support tool to analyze the impact of errors related to chemical data. It provides 72 noise models found in the specialized literature that allow errors to be introduced in different ways in classes and attributes. Each of them is properly documented and referenced, unifying their results through a specific S3 class, which benefits from customized print, summary and plot methods. The usage of the package is illustrated through four application examples considering real-world chemical datasets, where errors are prone to occur. The software presented will help to deepen the understanding of the problem of noisy chemical data, as well as to develop new robust algorithms and noise preprocessing methods properly adapted to different types of errors in this scenario.

Abstract Image

用noisemodel R包进行分类中的噪声模拟:分析化学数据误差影响的应用
从化学过程中创建的分类数据集可能受到错误的影响,这会损害所建立模型的准确性。这一事实突出了分析分类器对不同类型和级别的噪声的鲁棒性以了解其对潜在错误的行为的重要性。在此背景下,噪声模型被提出用于在受控环境中研究与噪声相关的现象学,允许以监督的方式将误差引入数据中。本文介绍了noisemodel R包,它首次广泛实现了分类数据集的噪声模型,并将其作为分析与化学数据相关的误差影响的支持工具。它提供了在专业文献中发现的72个噪声模型,这些模型允许在类和属性中以不同的方式引入错误。它们中的每一个都有适当的文档和引用,并通过特定的S3类统一它们的结果,该类受益于定制的打印、摘要和绘图方法。通过四个应用实例说明了该软件包的使用,这些应用实例考虑了现实世界的化学数据集,其中容易发生错误。所提供的软件将有助于加深对噪声化学数据问题的理解,以及开发新的鲁棒算法和噪声预处理方法,以适应这种情况下不同类型的误差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信