{"title":"Data Quality: Importance of the ‘Before Analysis’ Domain [Theory of Sampling (TOS)]","authors":"Kim H. Esbensen","doi":"10.1002/cem.70021","DOIUrl":null,"url":null,"abstract":"<p>Data Quality: what is it, where does it originate, how does it influence data modelling, what can chemometricians do about it? The ‘before analysis’ domain is prone to sampling errors resulting in uncertainties influencing the quality of both analysis and data analysis/data modelling. Nonrepresentative sampling of heterogeneous materials, batches, lots and process streams ‘before analysis’ contribute significantly to the total measurement uncertainty, MU<sub>total</sub> = MU<sub>sampling</sub> + MU<sub>analysis</sub>. The total sampling error (TSE) can dominate over the total analytical error (TAE) by factors ranging 5, 10 or <i>higher</i>, depending on the <i>degree</i> of material heterogeneity encountered and the specific sampling procedure employed to produce the final analytical aliquot, which is the only material actually analysed. The analytical aliquot is the physical manifestation of transgressing the boundary <span>from</span> the before analysis (sampling) domain <span>to</span> the domain of analysis. It is only possible to guarantee representativity of the analytical aliquot, and thus of the analytical results with respect to the original target batch/lot/process stream, by invoking the necessary sampling domain competence stipulated by theory of sampling (TOS). Primary sampling is the most important stage in the full lot-to-analysis pathway, quantitatively dominating MU<sub>total</sub> (but subsequent subsampling stages can also be significant). If the sources of adverse sampling error effects have not been eliminated, the sampling process is <i>biased</i> and MU<sub>total</sub> will be unnecessarily inflated. TOS offers ways and means to deal actively with a potential sampling bias (which is fundamentally different from the analytical bias). Overlooking, or deliberately ignoring dealing appropriately with sampling effects constitutes a lack of due diligence, which has critical bearings on the QC/QA demands on both analysis and data analysis/modelling. This article presents all uncertainty contributions in the lot-to-analysis-to-data modelling pathway, which must be identified and managed, eliminated or maximally reduced, to be able to document a fully minimised MU<sub>total</sub>. Data analysts/chemometricians are part of a scientific collegium covering all three domains: sampling—analysis—data modelling, which are collectively responsible for ‘data quality’. This comprehensive scope has serious implications for the current PAT paradigm, the foundation of which turns out to need significant reform regarding a key process sampling aspect regardless of whether physical samples, or PAT sensor technology spectra, are extracted/acquired. This article introduces the essential minimum TOS competence that must be mastered by stakeholders from all three domains.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70021","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.70021","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
Abstract
Data Quality: what is it, where does it originate, how does it influence data modelling, what can chemometricians do about it? The ‘before analysis’ domain is prone to sampling errors resulting in uncertainties influencing the quality of both analysis and data analysis/data modelling. Nonrepresentative sampling of heterogeneous materials, batches, lots and process streams ‘before analysis’ contribute significantly to the total measurement uncertainty, MUtotal = MUsampling + MUanalysis. The total sampling error (TSE) can dominate over the total analytical error (TAE) by factors ranging 5, 10 or higher, depending on the degree of material heterogeneity encountered and the specific sampling procedure employed to produce the final analytical aliquot, which is the only material actually analysed. The analytical aliquot is the physical manifestation of transgressing the boundary from the before analysis (sampling) domain to the domain of analysis. It is only possible to guarantee representativity of the analytical aliquot, and thus of the analytical results with respect to the original target batch/lot/process stream, by invoking the necessary sampling domain competence stipulated by theory of sampling (TOS). Primary sampling is the most important stage in the full lot-to-analysis pathway, quantitatively dominating MUtotal (but subsequent subsampling stages can also be significant). If the sources of adverse sampling error effects have not been eliminated, the sampling process is biased and MUtotal will be unnecessarily inflated. TOS offers ways and means to deal actively with a potential sampling bias (which is fundamentally different from the analytical bias). Overlooking, or deliberately ignoring dealing appropriately with sampling effects constitutes a lack of due diligence, which has critical bearings on the QC/QA demands on both analysis and data analysis/modelling. This article presents all uncertainty contributions in the lot-to-analysis-to-data modelling pathway, which must be identified and managed, eliminated or maximally reduced, to be able to document a fully minimised MUtotal. Data analysts/chemometricians are part of a scientific collegium covering all three domains: sampling—analysis—data modelling, which are collectively responsible for ‘data quality’. This comprehensive scope has serious implications for the current PAT paradigm, the foundation of which turns out to need significant reform regarding a key process sampling aspect regardless of whether physical samples, or PAT sensor technology spectra, are extracted/acquired. This article introduces the essential minimum TOS competence that must be mastered by stakeholders from all three domains.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.