血清蛋白电泳数据解释的监督机器学习模型。

IF 3 3区医学 Q1 PATHOLOGY

Pathology Pub Date : 2025-07-24 DOI:10.1016/j.pathol.2025.05.010

Yee-Ting Cheung, Hoi-Shan Leung, Jeremiah Sik-Bit Tseung, Kelvin Yat-Chung Yu, Mei-Tik Stella Leung, Chor-Kwan Ching, Yeow-Kuan Chong

{"title":"血清蛋白电泳数据解释的监督机器学习模型。","authors":"Yee-Ting Cheung, Hoi-Shan Leung, Jeremiah Sik-Bit Tseung, Kelvin Yat-Chung Yu, Mei-Tik Stella Leung, Chor-Kwan Ching, Yeow-Kuan Chong","doi":"10.1016/j.pathol.2025.05.010","DOIUrl":null,"url":null,"abstract":"Serum protein electrophoresis (SPE) is a frequently employed laboratory test in clinical settings, with over 10,000 requests per annum in our centre. It is primarily utilised for the diagnosis and monitoring of paraproteinaemia. Interpretation on SPE is time-consuming and relies on the expertise of pathologists, with potential interobserver variability. Assistance from machine learning algorithms could improve efficiency and objectiveness. Digitised capillary electrophoresis (CE) tracings acquired using the Sebia Minicap Protein(E) 6 kit were extracted from the analyser, and corresponding reports for SPE were obtained from the laboratory information systems of Princess Margaret Hospital (PMH) and Tuen Mun Hospital (TMH). Three artificial neural networks (for fractionation, classification and location plus quantification) were trained and evaluated against reference interpretations from one to two pathologists. Samples from PMH constitute the training datasets. Trained models were subsequently evaluated with samples from TMH. A total of 41,448 and 24,501 CE tracings and corresponding reports for SPE, spanning from October 2014 to November 2022, were obtained from PMH and TMH, respectively; 25,661-41,014 samples from PMH constituted the training datasets. Trained models were subsequently evaluated with 24,238 samples from TMH. The classification model achieved an area under the receiver operating characteristic curve of 0.976 in the testing dataset, with an agreement rate of 93.8%. The fractionation model had mean and standard deviation difference from reported manual fractioning of -0.0884 to 0.155 g/L and 0.315 to 2.04 g/L, respectively, across the six serum protein bands. Peak quantification by the location plus quantification model correlated with manual quantification, with Spearman's r of 0.976. The machine learning models achieved near-human performances. They enabled high-throughput SPE analyses and interpretation and improved objectiveness and reproducibility of results.","PeriodicalId":19915,"journal":{"name":"Pathology","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Supervised machine learning model for serum protein electrophoresis data interpretation.\",\"authors\":\"Yee-Ting Cheung, Hoi-Shan Leung, Jeremiah Sik-Bit Tseung, Kelvin Yat-Chung Yu, Mei-Tik Stella Leung, Chor-Kwan Ching, Yeow-Kuan Chong\",\"doi\":\"10.1016/j.pathol.2025.05.010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Serum protein electrophoresis (SPE) is a frequently employed laboratory test in clinical settings, with over 10,000 requests per annum in our centre. It is primarily utilised for the diagnosis and monitoring of paraproteinaemia. Interpretation on SPE is time-consuming and relies on the expertise of pathologists, with potential interobserver variability. Assistance from machine learning algorithms could improve efficiency and objectiveness. Digitised capillary electrophoresis (CE) tracings acquired using the Sebia Minicap Protein(E) 6 kit were extracted from the analyser, and corresponding reports for SPE were obtained from the laboratory information systems of Princess Margaret Hospital (PMH) and Tuen Mun Hospital (TMH). Three artificial neural networks (for fractionation, classification and location plus quantification) were trained and evaluated against reference interpretations from one to two pathologists. Samples from PMH constitute the training datasets. Trained models were subsequently evaluated with samples from TMH. A total of 41,448 and 24,501 CE tracings and corresponding reports for SPE, spanning from October 2014 to November 2022, were obtained from PMH and TMH, respectively; 25,661-41,014 samples from PMH constituted the training datasets. Trained models were subsequently evaluated with 24,238 samples from TMH. The classification model achieved an area under the receiver operating characteristic curve of 0.976 in the testing dataset, with an agreement rate of 93.8%. The fractionation model had mean and standard deviation difference from reported manual fractioning of -0.0884 to 0.155 g/L and 0.315 to 2.04 g/L, respectively, across the six serum protein bands. Peak quantification by the location plus quantification model correlated with manual quantification, with Spearman's r of 0.976. The machine learning models achieved near-human performances. They enabled high-throughput SPE analyses and interpretation and improved objectiveness and reproducibility of results.\",\"PeriodicalId\":19915,\"journal\":{\"name\":\"Pathology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pathology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.pathol.2025.05.010\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.pathol.2025.05.010","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

血清蛋白电泳（SPE）是临床环境中经常使用的实验室测试，每年在我们中心有超过10,000个请求。它主要用于副蛋白血症的诊断和监测。对SPE的解释是耗时的，并且依赖于病理学家的专业知识，并且存在潜在的观察者之间的差异。机器学习算法的帮助可以提高效率和客观性。使用Sebia Minicap Protein(E) 6试剂盒从分析仪中提取数字化毛细管电泳（CE）图，并从玛嘉烈医院（PMH）和屯门医院（TMH）的实验室信息系统中获取相应的SPE报告。三个人工神经网络（用于分馏，分类和定位加上量化）进行了训练，并根据一到两名病理学家的参考解释进行了评估。PMH的样本构成训练数据集。训练后的模型随后用TMH的样本进行评估。从2014年10月到2022年11月，PMH和TMH分别获得了41,448和24,501份SPE的CE跟踪和相应报告；来自PMH的25,661-41,014个样本构成了训练数据集。训练后的模型随后用来自TMH的24238个样本进行了评估。该分类模型在测试数据集中获得的受试者工作特征曲线下面积为0.976，符合率为93.8%。在6个血清蛋白条带上，该分离模型与报道的人工分离的均值和标准差分别为-0.0884 ~ 0.155 g/L和0.315 ~ 2.04 g/L。定位加定量与人工定量呈正相关，Spearman’s r为0.976。机器学习模型取得了接近人类的表现。它们使高通量SPE分析和解释成为可能，提高了结果的客观性和可重复性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Supervised machine learning model for serum protein electrophoresis data interpretation.

Serum protein electrophoresis (SPE) is a frequently employed laboratory test in clinical settings, with over 10,000 requests per annum in our centre. It is primarily utilised for the diagnosis and monitoring of paraproteinaemia. Interpretation on SPE is time-consuming and relies on the expertise of pathologists, with potential interobserver variability. Assistance from machine learning algorithms could improve efficiency and objectiveness. Digitised capillary electrophoresis (CE) tracings acquired using the Sebia Minicap Protein(E) 6 kit were extracted from the analyser, and corresponding reports for SPE were obtained from the laboratory information systems of Princess Margaret Hospital (PMH) and Tuen Mun Hospital (TMH). Three artificial neural networks (for fractionation, classification and location plus quantification) were trained and evaluated against reference interpretations from one to two pathologists. Samples from PMH constitute the training datasets. Trained models were subsequently evaluated with samples from TMH. A total of 41,448 and 24,501 CE tracings and corresponding reports for SPE, spanning from October 2014 to November 2022, were obtained from PMH and TMH, respectively; 25,661-41,014 samples from PMH constituted the training datasets. Trained models were subsequently evaluated with 24,238 samples from TMH. The classification model achieved an area under the receiver operating characteristic curve of 0.976 in the testing dataset, with an agreement rate of 93.8%. The fractionation model had mean and standard deviation difference from reported manual fractioning of -0.0884 to 0.155 g/L and 0.315 to 2.04 g/L, respectively, across the six serum protein bands. Peak quantification by the location plus quantification model correlated with manual quantification, with Spearman's r of 0.976. The machine learning models achieved near-human performances. They enabled high-throughput SPE analyses and interpretation and improved objectiveness and reproducibility of results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pathology 医学-病理学

CiteScore

6.50

自引率

2.20%

发文量

459

审稿时长

54 days

期刊介绍： Published by Elsevier from 2016 Pathology is the official journal of the Royal College of Pathologists of Australasia (RCPA). It is committed to publishing peer-reviewed, original articles related to the science of pathology in its broadest sense, including anatomical pathology, chemical pathology and biochemistry, cytopathology, experimental pathology, forensic pathology and morbid anatomy, genetics, haematology, immunology and immunopathology, microbiology and molecular pathology.