{"title":"Machine learning algorithm for the rapid and accurate detection of Plasmodium falciparum","authors":"Mr Andrew Hill","doi":"10.1016/j.ijid.2024.107439","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Manual cell counting is a malaria diagnostic bottleneck which could be alleviated by assistance from automated labelling. The high prevalence of malaria in under-developed regions requires highly precise and computationally efficient models to achieve rapid and accurate diagnosis, which in turn has the potential to be developed into a smartphone app.</div></div><div><h3>Methods</h3><div>Machine learning algorithms (MLA) consisting of a family of tiny (3,911 to 100,000 parameters) hybrid convolutional neural network / encoder-decoder models were developed which output a both a label {Parasite, Normal} and a confidence. The models were evaluated (k-fold validation) against an established Plasmodium falciparum cell dataset from the NIH.</div></div><div><h3>Results</h3><div>The models achieve between 95% and 98.5% accuracy. Labelling cells with a probability of malaria of 10-99% as uncertain, and ignoring them in analysis resulted in >99% accuracy for the remaining cells. Accuracy measurement is limited by mislabelled cells, with as little as 120 cells in 27,000 (0.4%) confident but wrong. Consensus between 8 independent models suggests at least 150 training cells (more than 50% of all “confident but wrong” cells) are mislabelled, and training without these cells improves model convergence and reliability.</div></div><div><h3>Discussion</h3><div>MLAs that assist diagnosis can be relied upon if they output certainty, and a confident diagnosis can be formed from only certain labels. In many cases a low percentage of cells with uncertain labels will not change diagnosis. Knowing that almost all cell labelling errors occur within the uncertain cells would enable a clinical workflow where expert time is focused on marginal cells within marginal cases. Larger models are prone to overfitting while their size limits the hardware they can be run on.</div></div><div><h3>Conclusion</h3><div>Accurate Plasmodium falciparum parasite identification is possible with 12,000 parameter models. Automation of bulk labelling work would allow expert time to be focused on cases where uncertainty would affect diagnosis. A path to reliable, rapid and mobile malaria diagnosis has been identified based on tiny models suitable for mobile phone deployment in poor malaria affected countries. Further work to enable rapid response to malaria is required.</div></div>","PeriodicalId":14006,"journal":{"name":"International Journal of Infectious Diseases","volume":"152 ","pages":"Article 107439"},"PeriodicalIF":4.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Infectious Diseases","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1201971224005149","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Manual cell counting is a malaria diagnostic bottleneck which could be alleviated by assistance from automated labelling. The high prevalence of malaria in under-developed regions requires highly precise and computationally efficient models to achieve rapid and accurate diagnosis, which in turn has the potential to be developed into a smartphone app.
Methods
Machine learning algorithms (MLA) consisting of a family of tiny (3,911 to 100,000 parameters) hybrid convolutional neural network / encoder-decoder models were developed which output a both a label {Parasite, Normal} and a confidence. The models were evaluated (k-fold validation) against an established Plasmodium falciparum cell dataset from the NIH.
Results
The models achieve between 95% and 98.5% accuracy. Labelling cells with a probability of malaria of 10-99% as uncertain, and ignoring them in analysis resulted in >99% accuracy for the remaining cells. Accuracy measurement is limited by mislabelled cells, with as little as 120 cells in 27,000 (0.4%) confident but wrong. Consensus between 8 independent models suggests at least 150 training cells (more than 50% of all “confident but wrong” cells) are mislabelled, and training without these cells improves model convergence and reliability.
Discussion
MLAs that assist diagnosis can be relied upon if they output certainty, and a confident diagnosis can be formed from only certain labels. In many cases a low percentage of cells with uncertain labels will not change diagnosis. Knowing that almost all cell labelling errors occur within the uncertain cells would enable a clinical workflow where expert time is focused on marginal cells within marginal cases. Larger models are prone to overfitting while their size limits the hardware they can be run on.
Conclusion
Accurate Plasmodium falciparum parasite identification is possible with 12,000 parameter models. Automation of bulk labelling work would allow expert time to be focused on cases where uncertainty would affect diagnosis. A path to reliable, rapid and mobile malaria diagnosis has been identified based on tiny models suitable for mobile phone deployment in poor malaria affected countries. Further work to enable rapid response to malaria is required.
期刊介绍:
International Journal of Infectious Diseases (IJID)
Publisher: International Society for Infectious Diseases
Publication Frequency: Monthly
Type: Peer-reviewed, Open Access
Scope:
Publishes original clinical and laboratory-based research.
Reports clinical trials, reviews, and some case reports.
Focuses on epidemiology, clinical diagnosis, treatment, and control of infectious diseases.
Emphasizes diseases common in under-resourced countries.