Emmanuelle A Dankwa, Martyn Plummer, Daniel Chapman, Rima Jeske, Julia Butt, Michael Hill, Tim Waterboer, Iona Y Millwood, Ling Yang, Christiana Kartsonaki
{"title":"幽门螺杆菌多重血清学校正。","authors":"Emmanuelle A Dankwa, Martyn Plummer, Daniel Chapman, Rima Jeske, Julia Butt, Michael Hill, Tim Waterboer, Iona Y Millwood, Ling Yang, Christiana Kartsonaki","doi":"10.1186/s41512-025-00202-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Helicobacter pylori (H. pylori) is a bacterium that colonizes the stomach and is a major risk factor for gastric cancer, with an estimated 89% of non-cardia gastric cancer cases worldwide attributable to H. pylori. Prospective studies provide reliable evidence for quantifying the association between gastric cancer and H. pylori, as they circumvent the risk of a false negative due to possible reduction in antibody levels before cancer development.</p><p><strong>Methods: </strong>In a large-scale prospective study within the China Kadoorie Biobank, H. pylori infection is being analysed as a risk factor for gastric cancer. The presence of infection is typically determined by serological tests. The immunoblot test, although well established, is more labour intensive and uses a larger amount of plasma than the alternative high-throughput multiplex serology test. Immunoblot outputs a binary positive/negative serostatus classification, while multiplex outputs a vector of continuous antigen measurements. When mapping such multidimensional continuous measurements onto a binary classification, statistical challenges arise in defining classification cut-offs and accounting for the differences in infection evidence provided by different antigens. We discuss these challenges and propose a novel solution to optimize the translation of the continuous measurements from multiplex serology into probabilities of H. pylori infection, using classification algorithms (Bayesian additive regressive trees (BART), multidimensional monotone BART, logistic regression, random forest and elastic net). We (i) calibrate and apply classification models to predict probabilities of H. pylori infection given multiplex measurements, (ii) compare the predictive performance of the models using immunoblot as reference, (iii) discuss reasons for the differences in predictive performance and (iv) apply the calibrated models to gain insights on the relative strengths of infection evidence provided by the various antigens.</p><p><strong>Results: </strong>All models showed high discriminative ability with at least 95% area under the curve (AUC) estimates on the training and test data. There was no substantial difference between the performance of models on the training and test data.</p><p><strong>Conclusions: </strong>Classification algorithms can be used to calibrate the H. pylori multiplex serology test to the immunoblot test in the China Kadoorie Biobank. This study furthers our understanding of the applicability of classification algorithms to the context of serologic tests.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"17"},"PeriodicalIF":2.6000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337413/pdf/","citationCount":"0","resultStr":"{\"title\":\"Calibrating multiplex serology for Helicobacter pylori.\",\"authors\":\"Emmanuelle A Dankwa, Martyn Plummer, Daniel Chapman, Rima Jeske, Julia Butt, Michael Hill, Tim Waterboer, Iona Y Millwood, Ling Yang, Christiana Kartsonaki\",\"doi\":\"10.1186/s41512-025-00202-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Helicobacter pylori (H. pylori) is a bacterium that colonizes the stomach and is a major risk factor for gastric cancer, with an estimated 89% of non-cardia gastric cancer cases worldwide attributable to H. pylori. Prospective studies provide reliable evidence for quantifying the association between gastric cancer and H. pylori, as they circumvent the risk of a false negative due to possible reduction in antibody levels before cancer development.</p><p><strong>Methods: </strong>In a large-scale prospective study within the China Kadoorie Biobank, H. pylori infection is being analysed as a risk factor for gastric cancer. The presence of infection is typically determined by serological tests. The immunoblot test, although well established, is more labour intensive and uses a larger amount of plasma than the alternative high-throughput multiplex serology test. Immunoblot outputs a binary positive/negative serostatus classification, while multiplex outputs a vector of continuous antigen measurements. When mapping such multidimensional continuous measurements onto a binary classification, statistical challenges arise in defining classification cut-offs and accounting for the differences in infection evidence provided by different antigens. We discuss these challenges and propose a novel solution to optimize the translation of the continuous measurements from multiplex serology into probabilities of H. pylori infection, using classification algorithms (Bayesian additive regressive trees (BART), multidimensional monotone BART, logistic regression, random forest and elastic net). We (i) calibrate and apply classification models to predict probabilities of H. pylori infection given multiplex measurements, (ii) compare the predictive performance of the models using immunoblot as reference, (iii) discuss reasons for the differences in predictive performance and (iv) apply the calibrated models to gain insights on the relative strengths of infection evidence provided by the various antigens.</p><p><strong>Results: </strong>All models showed high discriminative ability with at least 95% area under the curve (AUC) estimates on the training and test data. There was no substantial difference between the performance of models on the training and test data.</p><p><strong>Conclusions: </strong>Classification algorithms can be used to calibrate the H. pylori multiplex serology test to the immunoblot test in the China Kadoorie Biobank. This study furthers our understanding of the applicability of classification algorithms to the context of serologic tests.</p>\",\"PeriodicalId\":72800,\"journal\":{\"name\":\"Diagnostic and prognostic research\",\"volume\":\"9 1\",\"pages\":\"17\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337413/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostic and prognostic research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s41512-025-00202-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-025-00202-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
背景:幽门螺杆菌(Helicobacter pylori, H. pylori)是一种定植于胃部的细菌,是胃癌的主要危险因素,据估计,全世界89%的非贲门性胃癌病例可归因于幽门螺杆菌。前瞻性研究为量化胃癌和幽门螺杆菌之间的关系提供了可靠的证据,因为它们规避了因癌症发展前抗体水平可能降低而导致假阴性的风险。方法:在中国嘉道理生物库的一项大规模前瞻性研究中,幽门螺旋杆菌感染被分析为胃癌的危险因素。感染的存在通常通过血清学测试来确定。免疫印迹试验虽然已经建立,但比其他高通量多重血清学试验需要更多的劳动强度和更多的血浆。免疫印迹输出二元阳性/阴性血清状态分类,而多元输出连续抗原测量的载体。当将这种多维连续测量映射到二元分类时,在定义分类截止点和考虑不同抗原提供的感染证据的差异方面出现了统计上的挑战。我们讨论了这些挑战,并提出了一种新的解决方案,利用分类算法(贝叶斯加性回归树(BART)、多维单调BART、逻辑回归、随机森林和弹性网络),将多重血清学的连续测量结果优化转化为幽门螺杆菌感染的概率。我们(i)校准和应用分类模型来预测多重测量下幽门螺杆菌感染的概率,(ii)使用免疫印迹作为参考比较模型的预测性能,(iii)讨论预测性能差异的原因,(iv)应用校准模型来深入了解各种抗原提供的感染证据的相对优势。结果:所有模型均显示出较高的判别能力,对训练和测试数据的曲线下面积(AUC)估计至少为95%。模型在训练数据和测试数据上的性能没有显著差异。结论:分类算法可用于校正中国嘉道理生物库的多重幽门螺杆菌血清学检测和免疫印迹检测。这项研究进一步加深了我们对分类算法在血清学测试中的适用性的理解。
Calibrating multiplex serology for Helicobacter pylori.
Background: Helicobacter pylori (H. pylori) is a bacterium that colonizes the stomach and is a major risk factor for gastric cancer, with an estimated 89% of non-cardia gastric cancer cases worldwide attributable to H. pylori. Prospective studies provide reliable evidence for quantifying the association between gastric cancer and H. pylori, as they circumvent the risk of a false negative due to possible reduction in antibody levels before cancer development.
Methods: In a large-scale prospective study within the China Kadoorie Biobank, H. pylori infection is being analysed as a risk factor for gastric cancer. The presence of infection is typically determined by serological tests. The immunoblot test, although well established, is more labour intensive and uses a larger amount of plasma than the alternative high-throughput multiplex serology test. Immunoblot outputs a binary positive/negative serostatus classification, while multiplex outputs a vector of continuous antigen measurements. When mapping such multidimensional continuous measurements onto a binary classification, statistical challenges arise in defining classification cut-offs and accounting for the differences in infection evidence provided by different antigens. We discuss these challenges and propose a novel solution to optimize the translation of the continuous measurements from multiplex serology into probabilities of H. pylori infection, using classification algorithms (Bayesian additive regressive trees (BART), multidimensional monotone BART, logistic regression, random forest and elastic net). We (i) calibrate and apply classification models to predict probabilities of H. pylori infection given multiplex measurements, (ii) compare the predictive performance of the models using immunoblot as reference, (iii) discuss reasons for the differences in predictive performance and (iv) apply the calibrated models to gain insights on the relative strengths of infection evidence provided by the various antigens.
Results: All models showed high discriminative ability with at least 95% area under the curve (AUC) estimates on the training and test data. There was no substantial difference between the performance of models on the training and test data.
Conclusions: Classification algorithms can be used to calibrate the H. pylori multiplex serology test to the immunoblot test in the China Kadoorie Biobank. This study furthers our understanding of the applicability of classification algorithms to the context of serologic tests.