Eliya Abedi, Marcela Ewing, Elinor Nemlander, Jan Hasselström, Annika Sjövall, Axel C Carlsson, Andreas Rosenblad
{"title":"A machine learning tool for identifying metastatic colorectal cancer in primary care.","authors":"Eliya Abedi, Marcela Ewing, Elinor Nemlander, Jan Hasselström, Annika Sjövall, Axel C Carlsson, Andreas Rosenblad","doi":"10.1080/02813432.2025.2477155","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Detection of colorectal cancer (CRC) is mainly achieved by clinical assessment. As new treatments become available for metastatic CRC (MCRC), it is important to accurately identify these patients.</p><p><strong>Aim: </strong>To develop a predictive model for identifying MCRC in primary health care patients using diagnostic data analysed with machine learning.</p><p><strong>Design and setting: </strong>A case-control study utilising data on primary health care visits for 146 patients >18 years old diagnosed with MCRC in the Västra Götaland Region, Sweden during 2011, and 577 sex-, age, and primary health care centre-matched controls.</p><p><strong>Method: </strong>Stochastic gradient boosting was used to construct a model for predicting the presence of MCRC based on diagnostic codes from primary health care consultations during the year before index (diagnosis) date and number of consultations. Variable importance was estimated using the normalised relative influence (NRI) score. Risks of having MCRC were calculated using odds ratios of marginal effects (OR<sub>ME</sub>).</p><p><strong>Results: </strong>The optimal model included 76 variables with non-zero influence, had an area under the curve of 76.5%, a sensitivity of 77.8%, and a specificity of 69.2%. The 10 most important variables had a combined NRI of 61.0%. Number of consultations during the year before index date had the highest NRI at 19.2%, with an OR<sub>ME</sub> of 3.3.</p><p><strong>Conclusion: </strong>A machine learning method based on primary health care consultation frequency and diagnoses may be used to identify important variables for predicting presence of MCRC. Both primary health care consultations and associated diagnostic codes need to be taken into consideration.</p>","PeriodicalId":21521,"journal":{"name":"Scandinavian Journal of Primary Health Care","volume":" ","pages":"1-9"},"PeriodicalIF":1.9000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Journal of Primary Health Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/02813432.2025.2477155","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Detection of colorectal cancer (CRC) is mainly achieved by clinical assessment. As new treatments become available for metastatic CRC (MCRC), it is important to accurately identify these patients.
Aim: To develop a predictive model for identifying MCRC in primary health care patients using diagnostic data analysed with machine learning.
Design and setting: A case-control study utilising data on primary health care visits for 146 patients >18 years old diagnosed with MCRC in the Västra Götaland Region, Sweden during 2011, and 577 sex-, age, and primary health care centre-matched controls.
Method: Stochastic gradient boosting was used to construct a model for predicting the presence of MCRC based on diagnostic codes from primary health care consultations during the year before index (diagnosis) date and number of consultations. Variable importance was estimated using the normalised relative influence (NRI) score. Risks of having MCRC were calculated using odds ratios of marginal effects (ORME).
Results: The optimal model included 76 variables with non-zero influence, had an area under the curve of 76.5%, a sensitivity of 77.8%, and a specificity of 69.2%. The 10 most important variables had a combined NRI of 61.0%. Number of consultations during the year before index date had the highest NRI at 19.2%, with an ORME of 3.3.
Conclusion: A machine learning method based on primary health care consultation frequency and diagnoses may be used to identify important variables for predicting presence of MCRC. Both primary health care consultations and associated diagnostic codes need to be taken into consideration.
期刊介绍:
Scandinavian Journal of Primary Health Care is an international online open access journal publishing articles with relevance to general practice and primary health care. Focusing on the continuous professional development in family medicine the journal addresses clinical, epidemiological and humanistic topics in relation to the daily clinical practice.
Scandinavian Journal of Primary Health Care is owned by the members of the National Colleges of General Practice in the five Nordic countries through the Nordic Federation of General Practice (NFGP). The journal includes original research on topics related to general practice and family medicine, and publishes both quantitative and qualitative original research, editorials, discussion and analysis papers and reviews to facilitate continuing professional development in family medicine. The journal''s topics range broadly and include:
• Clinical family medicine
• Epidemiological research
• Qualitative research
• Health services research.