Data Quality: Importance of the ‘before analysis’ domain (Theory of Sampling, TOS)

IF 2.3 4区 化学 Q1 SOCIAL WORK
{"title":"Data Quality: Importance of the ‘before analysis’ domain (Theory of Sampling, TOS)","authors":"","doi":"10.1002/cem.70025","DOIUrl":null,"url":null,"abstract":"<p>Data analysts/chemometricians are part of a scientific collegium covering three distinct domains: i) sampling – ii) analysis – iii) data modelling, which are collectively influencing ‘data quality’. There is much more to data quality than analytical uncertainty. There are many situations where <i>analysis</i> is to be made of heterogeneous materials/batches/lots/flowing streams, which need to be <i>sampled</i> appropriately before analysis, following an often long and complex pathway ‘from-lot-to-aliquot’. In most cases, sampling and sub-sampling will <i>dominate</i> the total Measurement Uncertainty budget (MU<sub>total</sub>). Left-out MU<sub>sampling</sub> contributions may easily overwhelm the Total Analytical Error (TAE) uncertainty by factors 5, 10, 25 or <i>higher</i> as a function of the specific heterogeneity characteristics of the materials and systems targeted, and of the sampling procedure used (grab vs. composite sampling). Focus is here on the consequences of unwittingly ignoring the uncertainties originating in these domains, which e.g. will influence adversely on bilinear component directions (reducing model <i>accuracy</i>) as well as RMSE estimates reflecting <i>precision</i> (analyte concentration prediction, classification, time series prediction) and along the way will also clear up an evergreen mistake: contrary to many beliefs, ‘more data’ will <span>not</span> automatically reduce the magnitude of an unsatisfactory performance RMSE. It is shown how the Theory of Sampling (TOS) is the only guarantor of representative sampling in the critical ‘before analysis’ domain. This article introduces the essential minimum TOS competence which must be mastered by stakeholders from all three domains. The conceptual elements in the TOS <i>system</i> can be visualised as a graphic overview:</p><p>Kim H. Esbensen has been professor at three universities (National Geological Survey of Denmark and Greenland (2010–2015), Aalborg University, Denmark (2001–2010), Telemark Institute of Technology, Norway (1990–2000) and professeur associé, Université du Québec à Chicoutimi before switching to a quest as an independent consultant in 2015. He is a member of several scientific societies and has published widely across several scientific fields. He is the author of a widely used textbook in Multivariate Data Analysis (chemometrics), and in 2020 published: “Introduction to the Theory and Practice of Sampling”. He was chairman of the taskforce responsible for the world's first horizontal (matrix-independent) sampling standard DS 3077:2024 - Esbensen is the founding editor of: “Sampling Science and Technology (SST)” - https://www.sst-magazine.info/issues/ He can be reached at his homepage https://kheconsult.com/</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70025","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.70025","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

Abstract

Data analysts/chemometricians are part of a scientific collegium covering three distinct domains: i) sampling – ii) analysis – iii) data modelling, which are collectively influencing ‘data quality’. There is much more to data quality than analytical uncertainty. There are many situations where analysis is to be made of heterogeneous materials/batches/lots/flowing streams, which need to be sampled appropriately before analysis, following an often long and complex pathway ‘from-lot-to-aliquot’. In most cases, sampling and sub-sampling will dominate the total Measurement Uncertainty budget (MUtotal). Left-out MUsampling contributions may easily overwhelm the Total Analytical Error (TAE) uncertainty by factors 5, 10, 25 or higher as a function of the specific heterogeneity characteristics of the materials and systems targeted, and of the sampling procedure used (grab vs. composite sampling). Focus is here on the consequences of unwittingly ignoring the uncertainties originating in these domains, which e.g. will influence adversely on bilinear component directions (reducing model accuracy) as well as RMSE estimates reflecting precision (analyte concentration prediction, classification, time series prediction) and along the way will also clear up an evergreen mistake: contrary to many beliefs, ‘more data’ will not automatically reduce the magnitude of an unsatisfactory performance RMSE. It is shown how the Theory of Sampling (TOS) is the only guarantor of representative sampling in the critical ‘before analysis’ domain. This article introduces the essential minimum TOS competence which must be mastered by stakeholders from all three domains. The conceptual elements in the TOS system can be visualised as a graphic overview:

Kim H. Esbensen has been professor at three universities (National Geological Survey of Denmark and Greenland (2010–2015), Aalborg University, Denmark (2001–2010), Telemark Institute of Technology, Norway (1990–2000) and professeur associé, Université du Québec à Chicoutimi before switching to a quest as an independent consultant in 2015. He is a member of several scientific societies and has published widely across several scientific fields. He is the author of a widely used textbook in Multivariate Data Analysis (chemometrics), and in 2020 published: “Introduction to the Theory and Practice of Sampling”. He was chairman of the taskforce responsible for the world's first horizontal (matrix-independent) sampling standard DS 3077:2024 - Esbensen is the founding editor of: “Sampling Science and Technology (SST)” - https://www.sst-magazine.info/issues/ He can be reached at his homepage https://kheconsult.com/

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信