Data Quality: Importance of the ‘Before Analysis’ Domain [Theory of Sampling (TOS)]

IF 2.3 4区 化学 Q1 SOCIAL WORK
Kim H. Esbensen
{"title":"Data Quality: Importance of the ‘Before Analysis’ Domain [Theory of Sampling (TOS)]","authors":"Kim H. Esbensen","doi":"10.1002/cem.70021","DOIUrl":null,"url":null,"abstract":"<p>Data Quality: what is it, where does it originate, how does it influence data modelling, what can chemometricians do about it? The ‘before analysis’ domain is prone to sampling errors resulting in uncertainties influencing the quality of both analysis and data analysis/data modelling. Nonrepresentative sampling of heterogeneous materials, batches, lots and process streams ‘before analysis’ contribute significantly to the total measurement uncertainty, MU<sub>total</sub> = MU<sub>sampling</sub> + MU<sub>analysis</sub>. The total sampling error (TSE) can dominate over the total analytical error (TAE) by factors ranging 5, 10 or <i>higher</i>, depending on the <i>degree</i> of material heterogeneity encountered and the specific sampling procedure employed to produce the final analytical aliquot, which is the only material actually analysed. The analytical aliquot is the physical manifestation of transgressing the boundary <span>from</span> the before analysis (sampling) domain <span>to</span> the domain of analysis. It is only possible to guarantee representativity of the analytical aliquot, and thus of the analytical results with respect to the original target batch/lot/process stream, by invoking the necessary sampling domain competence stipulated by theory of sampling (TOS). Primary sampling is the most important stage in the full lot-to-analysis pathway, quantitatively dominating MU<sub>total</sub> (but subsequent subsampling stages can also be significant). If the sources of adverse sampling error effects have not been eliminated, the sampling process is <i>biased</i> and MU<sub>total</sub> will be unnecessarily inflated. TOS offers ways and means to deal actively with a potential sampling bias (which is fundamentally different from the analytical bias). Overlooking, or deliberately ignoring dealing appropriately with sampling effects constitutes a lack of due diligence, which has critical bearings on the QC/QA demands on both analysis and data analysis/modelling. This article presents all uncertainty contributions in the lot-to-analysis-to-data modelling pathway, which must be identified and managed, eliminated or maximally reduced, to be able to document a fully minimised MU<sub>total</sub>. Data analysts/chemometricians are part of a scientific collegium covering all three domains: sampling—analysis—data modelling, which are collectively responsible for ‘data quality’. This comprehensive scope has serious implications for the current PAT paradigm, the foundation of which turns out to need significant reform regarding a key process sampling aspect regardless of whether physical samples, or PAT sensor technology spectra, are extracted/acquired. This article introduces the essential minimum TOS competence that must be mastered by stakeholders from all three domains.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70021","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.70021","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

Abstract

Data Quality: what is it, where does it originate, how does it influence data modelling, what can chemometricians do about it? The ‘before analysis’ domain is prone to sampling errors resulting in uncertainties influencing the quality of both analysis and data analysis/data modelling. Nonrepresentative sampling of heterogeneous materials, batches, lots and process streams ‘before analysis’ contribute significantly to the total measurement uncertainty, MUtotal = MUsampling + MUanalysis. The total sampling error (TSE) can dominate over the total analytical error (TAE) by factors ranging 5, 10 or higher, depending on the degree of material heterogeneity encountered and the specific sampling procedure employed to produce the final analytical aliquot, which is the only material actually analysed. The analytical aliquot is the physical manifestation of transgressing the boundary from the before analysis (sampling) domain to the domain of analysis. It is only possible to guarantee representativity of the analytical aliquot, and thus of the analytical results with respect to the original target batch/lot/process stream, by invoking the necessary sampling domain competence stipulated by theory of sampling (TOS). Primary sampling is the most important stage in the full lot-to-analysis pathway, quantitatively dominating MUtotal (but subsequent subsampling stages can also be significant). If the sources of adverse sampling error effects have not been eliminated, the sampling process is biased and MUtotal will be unnecessarily inflated. TOS offers ways and means to deal actively with a potential sampling bias (which is fundamentally different from the analytical bias). Overlooking, or deliberately ignoring dealing appropriately with sampling effects constitutes a lack of due diligence, which has critical bearings on the QC/QA demands on both analysis and data analysis/modelling. This article presents all uncertainty contributions in the lot-to-analysis-to-data modelling pathway, which must be identified and managed, eliminated or maximally reduced, to be able to document a fully minimised MUtotal. Data analysts/chemometricians are part of a scientific collegium covering all three domains: sampling—analysis—data modelling, which are collectively responsible for ‘data quality’. This comprehensive scope has serious implications for the current PAT paradigm, the foundation of which turns out to need significant reform regarding a key process sampling aspect regardless of whether physical samples, or PAT sensor technology spectra, are extracted/acquired. This article introduces the essential minimum TOS competence that must be mastered by stakeholders from all three domains.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信