Machine learning techniques to identify risk factors of breast cancer among women in Mashhad, Iran.

Atieh Khaleghi, Seyyed Mohammad Tabatabaei, Zeinab Sadat Hosseini, Moslem Taheri Soodejani, Ehsan Mosa Farkhani, Maryam Yaghoobi
{"title":"Machine learning techniques to identify risk factors of breast cancer among women in Mashhad, Iran.","authors":"Atieh Khaleghi, Seyyed Mohammad Tabatabaei, Zeinab Sadat Hosseini, Moslem Taheri Soodejani, Ehsan Mosa Farkhani, Maryam Yaghoobi","doi":"10.15167/2421-4248/jpmh2024.65.2.3045","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Low survival rates of breast cancer in developing countries are mainly due to the lack of early detection plans and adequate diagnosis and treatment facilities.</p><p><strong>Objectives: </strong>This study aimed to apply machine learning techniques to recognize the most important breast cancer risk factors.</p><p><strong>Methods: </strong>This case-control study included women aged 17-75 years who were referred to medical centers affiliated with Mashhad University of Medical Science between March 21, 2015, and March 19, 2016. The study had two datasets: one with 516 samples (258 cases and 258 controls) and another with 606 samples (303 cases and 303 controls). Written informed consent has been observed. Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), and Principal Component Analysis (PCA) were applied using R studio software.</p><p><strong>Results: </strong>Regarding the DT and RF, the most important features that impact breast cancer were family cancer, individual history of breast cancer, biopsy sampling, rarely consumption of a dairy, fruit, and vegetable meal, while in PCA and LR these features including family cancer, pregnancy number, pregnancy tendency, abortion, first menstruation, the age of first childbirth and childbirth number.</p><p><strong>Conclusions: </strong>Machine learning algorithms can be used to extract the most important factors in the diagnosis of breast cancer in developing countries such as Iran.</p>","PeriodicalId":94106,"journal":{"name":"Journal of preventive medicine and hygiene","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11487743/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of preventive medicine and hygiene","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15167/2421-4248/jpmh2024.65.2.3045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Low survival rates of breast cancer in developing countries are mainly due to the lack of early detection plans and adequate diagnosis and treatment facilities.

Objectives: This study aimed to apply machine learning techniques to recognize the most important breast cancer risk factors.

Methods: This case-control study included women aged 17-75 years who were referred to medical centers affiliated with Mashhad University of Medical Science between March 21, 2015, and March 19, 2016. The study had two datasets: one with 516 samples (258 cases and 258 controls) and another with 606 samples (303 cases and 303 controls). Written informed consent has been observed. Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), and Principal Component Analysis (PCA) were applied using R studio software.

Results: Regarding the DT and RF, the most important features that impact breast cancer were family cancer, individual history of breast cancer, biopsy sampling, rarely consumption of a dairy, fruit, and vegetable meal, while in PCA and LR these features including family cancer, pregnancy number, pregnancy tendency, abortion, first menstruation, the age of first childbirth and childbirth number.

Conclusions: Machine learning algorithms can be used to extract the most important factors in the diagnosis of breast cancer in developing countries such as Iran.

用机器学习技术识别伊朗马什哈德妇女患乳腺癌的风险因素。
背景发展中国家乳腺癌存活率低的主要原因是缺乏早期检测计划以及充足的诊断和治疗设施:本研究旨在应用机器学习技术识别最重要的乳腺癌风险因素:这项病例对照研究纳入了 2015 年 3 月 21 日至 2016 年 3 月 19 日期间转诊至马什哈德医科大学附属医疗中心的 17-75 岁女性。研究有两个数据集:一个数据集包含 516 个样本(258 个病例和 258 个对照),另一个数据集包含 606 个样本(303 个病例和 303 个对照)。研究人员已获得书面知情同意。使用 R studio 软件应用了决策树(DT)、随机森林(RF)、逻辑回归(LR)和主成分分析(PCA):在 DT 和 RF 中,影响乳腺癌的最重要特征是家族癌症、个人乳腺癌病史、活检取样、很少食用乳制品、水果和蔬菜餐,而在 PCA 和 LR 中,这些特征包括家族癌症、怀孕次数、怀孕倾向、流产、初潮、初产年龄和生育次数:机器学习算法可用于提取伊朗等发展中国家诊断乳腺癌的最重要因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信