Mobina Shrestha, Bishwas Mandal, Vishal Mandal, Amir Babu Shrestha
{"title":"使用视觉语言模型和情境学习的b细胞淋巴瘤分类","authors":"Mobina Shrestha, Bishwas Mandal, Vishal Mandal, Amir Babu Shrestha","doi":"10.1002/ctd2.70081","DOIUrl":null,"url":null,"abstract":"<p>Dear Editor,</p><p>Accurate classification of B-cell lymphoma is essential for leading treatment decisions and prognostic assessments. Subtypes such as chronic lymphocytic leukaemia (CLL), follicular lymphoma (FL), and mantle cell lymphoma (MCL) often show overlapping morphologic features, particularly in small biopsies or poorly preserved samples. Even with supporting ancillary tests, distinguishing between these subtypes can be difficult, especially outside large university centers where hematopathology subspecialists may not be available. Digital pathology has brought with it the possibility of augmenting diagnostic accuracy with artificial intelligence (AI), particularly through deep learning algorithms. Several studies have shown promising results when convolutional neural networks are trained on thousands of annotated images to identify lymphoid neoplasms and other malignancies.<span><sup>1, 2</sup></span> But these approaches often require large-scale, curated datasets, annotated by domain experts.</p><p>This is where in-context learning (ICL) offers a meaningful alternative. ICL allows models to generate predictions based on just a few labelled examples shown at inference time, without the need for annotated datasets or model retraining. This mirrors how clinicians’ reason through new cases by recalling similar prior examples and using them to guide interpretation. Large vision-language models (VLMs) have demonstrated this ability in domains like dermatopathology, radiology, and gastrointestinal histology. However, despite the progress, to this date there have been no studies applying ICL to lymphoma subtyping. Given that B-cell lymphomas have well-described morphologic patterns and are amongst the most common lymphoid neoplasms encountered in practice, they are an ideal test case for this approach.</p><p>Therefore, in this study, we evaluated four state-of-the-art VLMs, that is, GPT-4o, Paligemma, CLIP and ALIGN in classifying CLL, FL, and MCL using digital histopathology images. We assess model performance in zero-shot and few-shot settings, simulating real-world diagnostic constraints where only a handful of reference cases may be available. Our aim is not to replace pathologists but to explore whether this type of AI can be used as a low-barrier, annotation-efficient tool to support lymphoma diagnosis, especially in environments where expert pathology review is limited.</p><p>In this study, a total of 150 Haematoxylin and Eosin (H&E) stained histopathology images with 50 each of CLL, FL and MCL were used. All images were obtained from the publicly available malignant lymphoma classification dataset on Kaggle.<span><sup>3</sup></span> Testing for GPT-4o was performed via the OpenAI Python API. Paligemma was implemented using the pretrained checkpoint (google/paligemma-3b-mix-224) from the Hugging Face model hub, configured for image-text inference. CLIP was implemented using the ViT-B/32 backbone (openai/clip-vit-base-patch32). To approximate ALIGN, we used the open-source kakaobrain/align-base model, which follows the original ALIGN architecture. For clarity, we refer to this model as “ALIGN” throughout the study. This implementation has been previously used in similar work by others.<span><sup>4, 5</sup></span> Models were tested using ICL at 0, 3, 5 and 10-shot settings. For each test case, support examples were randomly sampled from the remaining dataset and embedded into a structured prompt containing both image and diagnostic label. Prompts were framed using standardised clinical instructions, and label order was randomised to reduce positional bias. Similarly, model performance was evaluated using weighted F1 scores and 95% confidence intervals (CI) were calculated using bootstrap resampling (<i>n</i> = 10,000). Figure 1A shows the schematic workflow of the study.</p><p>Our experiments indicated that the performance improved consistently across all models, with increasing few-shot examples as shown in Figure 1B. GPT-4o achieved the highest overall F1 scores at each shot level, increasing from 0.54 (95% CI: 0.49–0.58) in the zero-shot setting to 0.74 (CI: 0.65–0.81) with 10-shot prompting. Paligemma achieved comparable F1 scores, obtaining 0.50 (95% CI: 0.45–0.56) in the zero-shot setting, and showed improved performance with few-shot prompting, reaching an F1 score of 0.71 (CI: 0.64–0.79) at 10-shot. CLIP and ALIGN showed moderate gains but appeared to plateau earlier, with 10-shot F1 scores of 0.67 (CI: 0.61–0.74) and 0.70 (CI: 0.63–0.75), respectively. The largest F1 score improvements for all models occurred between 0-shot and 5-shot, with more modest improvements from 5 to 10-shot, indicating diminishing returns beyond a certain point. As more number of examples were shown to the VLMs, the performance gap between the models began to narrow, particularly between GPT-4o and Paligemma implying that exposure to a few prior examples was enough to bring models to comparable levels of performance.</p><p>On comparison of models’ performance by lymphoma subtype, we observed mixed results given the morphologic differences between them. CLL, however, was always well-classified across all models as shown in Figure 1C. Even in the absence of support examples (i.e., at zero-shot), models were able to recognise typical CLL features such as small, mature lymphocytes and proliferation centers. At 10-shot, GPT-4o and Paligemma both reached an F1 score of 0.79, whilst ALIGN and CLIP achieved an F1 score of 0.74 each for CLL prediction. FL, on the other hand, was more difficult to predict, especially in the zero-shot setting. The reason behind this could be its variable nodular architecture and some of its features overlapping with other small B-cell lymphomas. However, the performance improved with the addition of support examples. GPT-4o showed the best improvement, increasing from an F1 score of 0.48 to 0.72, demonstrating that FL benefited from few-shot prompting. On the other hand, Paligemma attained the second-best result with an F1 score of 0.69 at 10-shot. Finally, in predicting MCL, the models performed somewhat better than they did with FL, but their results were still not as strong as for CLL. Although zero-shot F1 scores were modest across models, they all showed better performance with increasing shot numbers. At 10-shot, GPT-4o led with an F1 score of 0.71, followed closely by ALIGN (F1 = 0.68), Paligemma (F1 = 0.66) and CLIP (F1 = 0.64). Improvements here suggest that the models were able to learn and apply subtle features such as nuclear irregularity and cytologic monotony. Overall, the models performed best when the morphologic patterns were distinct and benefited from even a few well-chosen reference cases when features were more ambiguous.</p><p>One notable bottleneck during experimentation was the prompt length constraint, which posed a practical limitation for GPT-4o and Paligemma, as both models operate within fixed input token capacities. However, we were able to include all 10 examples per class without truncation by optimising prompt formatting, reducing redundancy in the prompt phrasing, and making sure that the image resolution remained within context length. CLIP and ALIGN, by contrast, processed each support example independently, so prompt length was not a limiting factor in those models. However, without any model retraining, all four evaluated VLMs viz. GPT-4o, Paligemma, CLIP, and ALIGN showed consistent improvements in performance with increasing few-shot settings. GPT-4o achieved the highest overall accuracy and most stable gains across all settings, particularly in diagnostically challenging subtypes such as FL and MCL. These findings suggest that even with a limited number of reference cases, pretrained VLMs can be guided to perform complex morphologic classification tasks with reasonable F1 scores. Whilst the results are promising, several practical limitations remain, including variability in image quality and the controlled nature of the dataset. Therefore, further work is needed to validate this approach in larger, more diverse cohorts and to assess its reliability across a wider range of morphologic scenarios.</p><p><i>Conceptualisation</i>: Mobina Shrestha and Vishal Mandal. <i>Methods</i>: Mobina Shrestha, Bishwas Mandal and Vishal Mandal. <i>Formal Analysis</i>: Mobina Shrestha, Bishwas Mandal and Vishal Mandal. <i>Data Analysis</i>: Mobina Shrestha. <i>Figures and Visualisation</i>: Mobina Shrestha. <i>Original Paper Writing</i>: Mobina Shrestha. <i>Paper Revision and Edits</i>: Bishwas Mandal, Vishal Mandal and Amir Babu Shrestha.</p><p>The authors declare no conflicts of interest.</p><p>This study was conducted using publicly available, de-identified datasets and did not involve identifiable patient data. As such, institutional review board approval and informed consent were not required.</p>","PeriodicalId":72605,"journal":{"name":"Clinical and translational discovery","volume":"5 4","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ctd2.70081","citationCount":"0","resultStr":"{\"title\":\"B-cell lymphoma classification using vision-language models and in-context learning\",\"authors\":\"Mobina Shrestha, Bishwas Mandal, Vishal Mandal, Amir Babu Shrestha\",\"doi\":\"10.1002/ctd2.70081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Dear Editor,</p><p>Accurate classification of B-cell lymphoma is essential for leading treatment decisions and prognostic assessments. Subtypes such as chronic lymphocytic leukaemia (CLL), follicular lymphoma (FL), and mantle cell lymphoma (MCL) often show overlapping morphologic features, particularly in small biopsies or poorly preserved samples. Even with supporting ancillary tests, distinguishing between these subtypes can be difficult, especially outside large university centers where hematopathology subspecialists may not be available. Digital pathology has brought with it the possibility of augmenting diagnostic accuracy with artificial intelligence (AI), particularly through deep learning algorithms. Several studies have shown promising results when convolutional neural networks are trained on thousands of annotated images to identify lymphoid neoplasms and other malignancies.<span><sup>1, 2</sup></span> But these approaches often require large-scale, curated datasets, annotated by domain experts.</p><p>This is where in-context learning (ICL) offers a meaningful alternative. ICL allows models to generate predictions based on just a few labelled examples shown at inference time, without the need for annotated datasets or model retraining. This mirrors how clinicians’ reason through new cases by recalling similar prior examples and using them to guide interpretation. Large vision-language models (VLMs) have demonstrated this ability in domains like dermatopathology, radiology, and gastrointestinal histology. However, despite the progress, to this date there have been no studies applying ICL to lymphoma subtyping. Given that B-cell lymphomas have well-described morphologic patterns and are amongst the most common lymphoid neoplasms encountered in practice, they are an ideal test case for this approach.</p><p>Therefore, in this study, we evaluated four state-of-the-art VLMs, that is, GPT-4o, Paligemma, CLIP and ALIGN in classifying CLL, FL, and MCL using digital histopathology images. We assess model performance in zero-shot and few-shot settings, simulating real-world diagnostic constraints where only a handful of reference cases may be available. Our aim is not to replace pathologists but to explore whether this type of AI can be used as a low-barrier, annotation-efficient tool to support lymphoma diagnosis, especially in environments where expert pathology review is limited.</p><p>In this study, a total of 150 Haematoxylin and Eosin (H&E) stained histopathology images with 50 each of CLL, FL and MCL were used. All images were obtained from the publicly available malignant lymphoma classification dataset on Kaggle.<span><sup>3</sup></span> Testing for GPT-4o was performed via the OpenAI Python API. Paligemma was implemented using the pretrained checkpoint (google/paligemma-3b-mix-224) from the Hugging Face model hub, configured for image-text inference. CLIP was implemented using the ViT-B/32 backbone (openai/clip-vit-base-patch32). To approximate ALIGN, we used the open-source kakaobrain/align-base model, which follows the original ALIGN architecture. For clarity, we refer to this model as “ALIGN” throughout the study. This implementation has been previously used in similar work by others.<span><sup>4, 5</sup></span> Models were tested using ICL at 0, 3, 5 and 10-shot settings. For each test case, support examples were randomly sampled from the remaining dataset and embedded into a structured prompt containing both image and diagnostic label. Prompts were framed using standardised clinical instructions, and label order was randomised to reduce positional bias. Similarly, model performance was evaluated using weighted F1 scores and 95% confidence intervals (CI) were calculated using bootstrap resampling (<i>n</i> = 10,000). Figure 1A shows the schematic workflow of the study.</p><p>Our experiments indicated that the performance improved consistently across all models, with increasing few-shot examples as shown in Figure 1B. GPT-4o achieved the highest overall F1 scores at each shot level, increasing from 0.54 (95% CI: 0.49–0.58) in the zero-shot setting to 0.74 (CI: 0.65–0.81) with 10-shot prompting. Paligemma achieved comparable F1 scores, obtaining 0.50 (95% CI: 0.45–0.56) in the zero-shot setting, and showed improved performance with few-shot prompting, reaching an F1 score of 0.71 (CI: 0.64–0.79) at 10-shot. CLIP and ALIGN showed moderate gains but appeared to plateau earlier, with 10-shot F1 scores of 0.67 (CI: 0.61–0.74) and 0.70 (CI: 0.63–0.75), respectively. The largest F1 score improvements for all models occurred between 0-shot and 5-shot, with more modest improvements from 5 to 10-shot, indicating diminishing returns beyond a certain point. As more number of examples were shown to the VLMs, the performance gap between the models began to narrow, particularly between GPT-4o and Paligemma implying that exposure to a few prior examples was enough to bring models to comparable levels of performance.</p><p>On comparison of models’ performance by lymphoma subtype, we observed mixed results given the morphologic differences between them. CLL, however, was always well-classified across all models as shown in Figure 1C. Even in the absence of support examples (i.e., at zero-shot), models were able to recognise typical CLL features such as small, mature lymphocytes and proliferation centers. At 10-shot, GPT-4o and Paligemma both reached an F1 score of 0.79, whilst ALIGN and CLIP achieved an F1 score of 0.74 each for CLL prediction. FL, on the other hand, was more difficult to predict, especially in the zero-shot setting. The reason behind this could be its variable nodular architecture and some of its features overlapping with other small B-cell lymphomas. However, the performance improved with the addition of support examples. GPT-4o showed the best improvement, increasing from an F1 score of 0.48 to 0.72, demonstrating that FL benefited from few-shot prompting. On the other hand, Paligemma attained the second-best result with an F1 score of 0.69 at 10-shot. Finally, in predicting MCL, the models performed somewhat better than they did with FL, but their results were still not as strong as for CLL. Although zero-shot F1 scores were modest across models, they all showed better performance with increasing shot numbers. At 10-shot, GPT-4o led with an F1 score of 0.71, followed closely by ALIGN (F1 = 0.68), Paligemma (F1 = 0.66) and CLIP (F1 = 0.64). Improvements here suggest that the models were able to learn and apply subtle features such as nuclear irregularity and cytologic monotony. Overall, the models performed best when the morphologic patterns were distinct and benefited from even a few well-chosen reference cases when features were more ambiguous.</p><p>One notable bottleneck during experimentation was the prompt length constraint, which posed a practical limitation for GPT-4o and Paligemma, as both models operate within fixed input token capacities. However, we were able to include all 10 examples per class without truncation by optimising prompt formatting, reducing redundancy in the prompt phrasing, and making sure that the image resolution remained within context length. CLIP and ALIGN, by contrast, processed each support example independently, so prompt length was not a limiting factor in those models. However, without any model retraining, all four evaluated VLMs viz. GPT-4o, Paligemma, CLIP, and ALIGN showed consistent improvements in performance with increasing few-shot settings. GPT-4o achieved the highest overall accuracy and most stable gains across all settings, particularly in diagnostically challenging subtypes such as FL and MCL. These findings suggest that even with a limited number of reference cases, pretrained VLMs can be guided to perform complex morphologic classification tasks with reasonable F1 scores. Whilst the results are promising, several practical limitations remain, including variability in image quality and the controlled nature of the dataset. Therefore, further work is needed to validate this approach in larger, more diverse cohorts and to assess its reliability across a wider range of morphologic scenarios.</p><p><i>Conceptualisation</i>: Mobina Shrestha and Vishal Mandal. <i>Methods</i>: Mobina Shrestha, Bishwas Mandal and Vishal Mandal. <i>Formal Analysis</i>: Mobina Shrestha, Bishwas Mandal and Vishal Mandal. <i>Data Analysis</i>: Mobina Shrestha. <i>Figures and Visualisation</i>: Mobina Shrestha. <i>Original Paper Writing</i>: Mobina Shrestha. <i>Paper Revision and Edits</i>: Bishwas Mandal, Vishal Mandal and Amir Babu Shrestha.</p><p>The authors declare no conflicts of interest.</p><p>This study was conducted using publicly available, de-identified datasets and did not involve identifiable patient data. As such, institutional review board approval and informed consent were not required.</p>\",\"PeriodicalId\":72605,\"journal\":{\"name\":\"Clinical and translational discovery\",\"volume\":\"5 4\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ctd2.70081\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical and translational discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ctd2.70081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and translational discovery","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ctd2.70081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
B-cell lymphoma classification using vision-language models and in-context learning
Dear Editor,
Accurate classification of B-cell lymphoma is essential for leading treatment decisions and prognostic assessments. Subtypes such as chronic lymphocytic leukaemia (CLL), follicular lymphoma (FL), and mantle cell lymphoma (MCL) often show overlapping morphologic features, particularly in small biopsies or poorly preserved samples. Even with supporting ancillary tests, distinguishing between these subtypes can be difficult, especially outside large university centers where hematopathology subspecialists may not be available. Digital pathology has brought with it the possibility of augmenting diagnostic accuracy with artificial intelligence (AI), particularly through deep learning algorithms. Several studies have shown promising results when convolutional neural networks are trained on thousands of annotated images to identify lymphoid neoplasms and other malignancies.1, 2 But these approaches often require large-scale, curated datasets, annotated by domain experts.
This is where in-context learning (ICL) offers a meaningful alternative. ICL allows models to generate predictions based on just a few labelled examples shown at inference time, without the need for annotated datasets or model retraining. This mirrors how clinicians’ reason through new cases by recalling similar prior examples and using them to guide interpretation. Large vision-language models (VLMs) have demonstrated this ability in domains like dermatopathology, radiology, and gastrointestinal histology. However, despite the progress, to this date there have been no studies applying ICL to lymphoma subtyping. Given that B-cell lymphomas have well-described morphologic patterns and are amongst the most common lymphoid neoplasms encountered in practice, they are an ideal test case for this approach.
Therefore, in this study, we evaluated four state-of-the-art VLMs, that is, GPT-4o, Paligemma, CLIP and ALIGN in classifying CLL, FL, and MCL using digital histopathology images. We assess model performance in zero-shot and few-shot settings, simulating real-world diagnostic constraints where only a handful of reference cases may be available. Our aim is not to replace pathologists but to explore whether this type of AI can be used as a low-barrier, annotation-efficient tool to support lymphoma diagnosis, especially in environments where expert pathology review is limited.
In this study, a total of 150 Haematoxylin and Eosin (H&E) stained histopathology images with 50 each of CLL, FL and MCL were used. All images were obtained from the publicly available malignant lymphoma classification dataset on Kaggle.3 Testing for GPT-4o was performed via the OpenAI Python API. Paligemma was implemented using the pretrained checkpoint (google/paligemma-3b-mix-224) from the Hugging Face model hub, configured for image-text inference. CLIP was implemented using the ViT-B/32 backbone (openai/clip-vit-base-patch32). To approximate ALIGN, we used the open-source kakaobrain/align-base model, which follows the original ALIGN architecture. For clarity, we refer to this model as “ALIGN” throughout the study. This implementation has been previously used in similar work by others.4, 5 Models were tested using ICL at 0, 3, 5 and 10-shot settings. For each test case, support examples were randomly sampled from the remaining dataset and embedded into a structured prompt containing both image and diagnostic label. Prompts were framed using standardised clinical instructions, and label order was randomised to reduce positional bias. Similarly, model performance was evaluated using weighted F1 scores and 95% confidence intervals (CI) were calculated using bootstrap resampling (n = 10,000). Figure 1A shows the schematic workflow of the study.
Our experiments indicated that the performance improved consistently across all models, with increasing few-shot examples as shown in Figure 1B. GPT-4o achieved the highest overall F1 scores at each shot level, increasing from 0.54 (95% CI: 0.49–0.58) in the zero-shot setting to 0.74 (CI: 0.65–0.81) with 10-shot prompting. Paligemma achieved comparable F1 scores, obtaining 0.50 (95% CI: 0.45–0.56) in the zero-shot setting, and showed improved performance with few-shot prompting, reaching an F1 score of 0.71 (CI: 0.64–0.79) at 10-shot. CLIP and ALIGN showed moderate gains but appeared to plateau earlier, with 10-shot F1 scores of 0.67 (CI: 0.61–0.74) and 0.70 (CI: 0.63–0.75), respectively. The largest F1 score improvements for all models occurred between 0-shot and 5-shot, with more modest improvements from 5 to 10-shot, indicating diminishing returns beyond a certain point. As more number of examples were shown to the VLMs, the performance gap between the models began to narrow, particularly between GPT-4o and Paligemma implying that exposure to a few prior examples was enough to bring models to comparable levels of performance.
On comparison of models’ performance by lymphoma subtype, we observed mixed results given the morphologic differences between them. CLL, however, was always well-classified across all models as shown in Figure 1C. Even in the absence of support examples (i.e., at zero-shot), models were able to recognise typical CLL features such as small, mature lymphocytes and proliferation centers. At 10-shot, GPT-4o and Paligemma both reached an F1 score of 0.79, whilst ALIGN and CLIP achieved an F1 score of 0.74 each for CLL prediction. FL, on the other hand, was more difficult to predict, especially in the zero-shot setting. The reason behind this could be its variable nodular architecture and some of its features overlapping with other small B-cell lymphomas. However, the performance improved with the addition of support examples. GPT-4o showed the best improvement, increasing from an F1 score of 0.48 to 0.72, demonstrating that FL benefited from few-shot prompting. On the other hand, Paligemma attained the second-best result with an F1 score of 0.69 at 10-shot. Finally, in predicting MCL, the models performed somewhat better than they did with FL, but their results were still not as strong as for CLL. Although zero-shot F1 scores were modest across models, they all showed better performance with increasing shot numbers. At 10-shot, GPT-4o led with an F1 score of 0.71, followed closely by ALIGN (F1 = 0.68), Paligemma (F1 = 0.66) and CLIP (F1 = 0.64). Improvements here suggest that the models were able to learn and apply subtle features such as nuclear irregularity and cytologic monotony. Overall, the models performed best when the morphologic patterns were distinct and benefited from even a few well-chosen reference cases when features were more ambiguous.
One notable bottleneck during experimentation was the prompt length constraint, which posed a practical limitation for GPT-4o and Paligemma, as both models operate within fixed input token capacities. However, we were able to include all 10 examples per class without truncation by optimising prompt formatting, reducing redundancy in the prompt phrasing, and making sure that the image resolution remained within context length. CLIP and ALIGN, by contrast, processed each support example independently, so prompt length was not a limiting factor in those models. However, without any model retraining, all four evaluated VLMs viz. GPT-4o, Paligemma, CLIP, and ALIGN showed consistent improvements in performance with increasing few-shot settings. GPT-4o achieved the highest overall accuracy and most stable gains across all settings, particularly in diagnostically challenging subtypes such as FL and MCL. These findings suggest that even with a limited number of reference cases, pretrained VLMs can be guided to perform complex morphologic classification tasks with reasonable F1 scores. Whilst the results are promising, several practical limitations remain, including variability in image quality and the controlled nature of the dataset. Therefore, further work is needed to validate this approach in larger, more diverse cohorts and to assess its reliability across a wider range of morphologic scenarios.
Conceptualisation: Mobina Shrestha and Vishal Mandal. Methods: Mobina Shrestha, Bishwas Mandal and Vishal Mandal. Formal Analysis: Mobina Shrestha, Bishwas Mandal and Vishal Mandal. Data Analysis: Mobina Shrestha. Figures and Visualisation: Mobina Shrestha. Original Paper Writing: Mobina Shrestha. Paper Revision and Edits: Bishwas Mandal, Vishal Mandal and Amir Babu Shrestha.
The authors declare no conflicts of interest.
This study was conducted using publicly available, de-identified datasets and did not involve identifiable patient data. As such, institutional review board approval and informed consent were not required.