O. Castrillón, J. A. Arango, Luis Fernando Castillo
{"title":"Análisis de la fertilidad por medio de técnicas de minería datos","authors":"O. Castrillón, J. A. Arango, Luis Fernando Castillo","doi":"10.4067/s0718-07642022000300203","DOIUrl":null,"url":null,"abstract":"20%. Abstract The primary objective of this research study is to predict the most important variables that affect fertility in a person. The study is conducted by using the automatic learning and data mining platform Weka, the expectation maximization (EM) clustering algorithm, SimpleKMeans, and the classification algorithm J48, which behaves similarly to a Bayesian algorithm. Initially, an existing database is modeled until 105 records and nine variables are reached, eight independent variables (age, illnesses, accidents, surgeries, fever, alcohol, smoker, and sedentary lifestyle) and one dependent variable (fertility). The results revealed the five most influential variables: 1) age, 2) accidents, 3) fever, 4) surgery, and 5) alcohol. The success rate is over 90% when a cross-validation 80% - 20% is applied. It is concluded that the random forest and clustering algorithms employed here allow to clearly determine the most important variables that affect fertility in a person.","PeriodicalId":35948,"journal":{"name":"Informacion Tecnologica","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informacion Tecnologica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4067/s0718-07642022000300203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 1
Abstract
20%. Abstract The primary objective of this research study is to predict the most important variables that affect fertility in a person. The study is conducted by using the automatic learning and data mining platform Weka, the expectation maximization (EM) clustering algorithm, SimpleKMeans, and the classification algorithm J48, which behaves similarly to a Bayesian algorithm. Initially, an existing database is modeled until 105 records and nine variables are reached, eight independent variables (age, illnesses, accidents, surgeries, fever, alcohol, smoker, and sedentary lifestyle) and one dependent variable (fertility). The results revealed the five most influential variables: 1) age, 2) accidents, 3) fever, 4) surgery, and 5) alcohol. The success rate is over 90% when a cross-validation 80% - 20% is applied. It is concluded that the random forest and clustering algorithms employed here allow to clearly determine the most important variables that affect fertility in a person.
期刊介绍:
The Información tecnológica magazine is a service of the Center for Information Technology (CIT), this service is restricted and prohibited their sale to third parties as well as the total or partial reproduction for commercial purposes. The articles presented in this magazine are for original papers sent by the authors and have been accepted for publication by a committee, and an Editorial Committee of Referees. The Center for Information Technology is not responsible for the opinions contained in the articles, that responsibility rests with the perpetrators of these.