{"title":"Count Data Regression Modelling: An Application to Monkeypox Confirmed Cases","authors":"Divya Vijithaswan Nair, Rujuta Hadaye","doi":"10.18502/jbe.v9i2.14626","DOIUrl":"https://doi.org/10.18502/jbe.v9i2.14626","url":null,"abstract":"Introduction: With the presence of COVID 19, some countries also faced an increase in number of cases due to Monkeypox virus. The main aim of this research was to investigate whether it is possible to fit count data regression models to predict the daily incidence of Monkeypox confirmed cases. \u0000Methods: In this study we have used two types of traditional count regression models like Poisson regression model and Negative binomial regression model using identity and logarithmic link function. Since our data was overdispersed, Negative binomial regression model with logarithmic link function fitted well as compared to other models. The parameters were estimated using SPSS, version 23.0. \u0000Results: The Negative Binomial Regression model with logarithm function fits well to the data related to Monkeypox cases. Therefore, the model shows that majority of the countries like Brazil, Canada, France, Germany, Peru, Spain, United Kingdom and United States of America shows significant decrease in number of cases with respect to time. The prediction line was plotted using this model where the line predicts well about the daily Monkeypox cases reported by different countries. \u0000Conclusion: From our study, we concluded that the count data regression model can be used widely to predict the incidence of any disease. The countries like Canada and Brazil have largest and smallest slope coefficient which shows maximum and minimum decrease in expected number of cases confirmed each day respectively.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"137 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139453288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Akbarzadeh-Jahromi, Negar Taheri, Babak Dashtdar, Nasim Taheri, Fatemeh Abiri, Marjan Zare
{"title":"The Prevalence of Human Papilloma Virus Infection and Its High Risk Genotypes among Healthy Women in 28 Provinces in Iran; A Systematic Review and Meta-Analysis","authors":"M. Akbarzadeh-Jahromi, Negar Taheri, Babak Dashtdar, Nasim Taheri, Fatemeh Abiri, Marjan Zare","doi":"10.18502/jbe.v9i2.14625","DOIUrl":"https://doi.org/10.18502/jbe.v9i2.14625","url":null,"abstract":"Introduction: Human Papilloma Virus infection (HPV) high-risk genotypes are responsible for up to 70% of invasive cervical cancers. It was aimed to determine the national and provincial prevalence of the total HPV and its high-risk genotypes including HPV genotype 16 (HPV16) and HPV genotype 18 (HPV18), and HPV genotypes other than genotypes of 16 and 18 (HPV other genotypes) among Iranian healthy women. \u0000Methods: Iran with 28 provinces locates at latitude and longitude of 32° 00' north and 53° 00' east. All Persian and English studies reporting HPV infection based on cervical specimens were selected through searching the PubMed, Magiran, Scopus, Irandoc databases, and Google Scholar research search engine. Sample size and event rates were used to compute the overall event rates and 95% confidence interval (95% C.I); Fixed or random effects model, heterogeneity indices including Q-statistics (p-value), and degree of heterogeneity (I2) were reported. The search was done up to February 29, 2022. Comprehensive Meta-analysis 2.2.064 and ArcGIS 10.8.2 software tools were used at a significance level of <0.05. \u0000Results: The meta-analysis included nineteen studies with 258839 participants. The national meta-analysis resulted in a total HPV prevalence of 0.025 (95% C.I 0.016, 0.039); those of HPV16, HPV18, and HPV other genotypes were 0.032 (95% C.I 0.019, 0.051), 0.028 (95% C.I 0.019, 0.040), and 0.048 (95% C.I 0.033, \u00000.069), respectively. The provincial meta-analysis showed that the total HPV prevalence was highest in Zanjn and Kerman (0.323 and 0.240, respectively); that of HPV16 was highest in Boushehr and Khozestan (0.298 and 0.253, respectively); that of HPV18 was highest in Tehran (0.089) and that of HPV other genotypes was highest in Khozestan (0.542). \u0000Conclusion: The current results would help policymakers and health managers accentuate on further implementation of screening strategies and health services in needier areas such as Zanjan, Kerma, Khozestan, and Tehran.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"124 31","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139391512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parisa Rezanejad-Asl, Farid Zayeri, Abbas Hajifathali
{"title":"Addressing Heteroscedasticity in Correlated Binary Data: A Bayesian Mixed Effects Location Scale Approach","authors":"Parisa Rezanejad-Asl, Farid Zayeri, Abbas Hajifathali","doi":"10.18502/jbe.v9i2.14628","DOIUrl":"https://doi.org/10.18502/jbe.v9i2.14628","url":null,"abstract":"Introduction: The mixed effects logistic regression model is a common model for analysing correlated binary data as longitudinal data. The between and within subject variances are typically considered to be homogeneous but longitudinal data often show heterogeneity in these variances. This study proposes a Bayesian mixed effects location scale model to accommodate heteroscedasticity in binary data analysis. \u0000Methods: This study was carried out in two stages; first, the simulation study was used to evaluate the accuracy of the proposed model with the Bayesian approach and then the proposed model was applied to a real data. In simulation study, the data were generated from the mixed effects location scale model with different correlations between the random location effect and random scale effect and different sample sizes. In order to evaluate the accuracy of the estimations, the Root Mean Square Error, bias and Coverage Probability were calculated and the deviance information criterion was used to select the appropriate model. At the end we utilized this model to analyse uric acid levels of patients with haematological disorders. \u0000Results: The simulation results show the accuracy of model parameter estimates as well as the correlation between random location and scale effects. They also display that if a random scale effect is present in the data, it should be accounted for in model. Otherwise, the model is forced to assign the within subject variation due to these subject random effects to the error term. The results of real data are also in line with this. The odds of having normal UA levels increases by a factor of 26% per week. Due to the positive value of the covariance parameter, patients with higher mean of UA levels show higher variation in UA levels. Furthermore, the significance of the covariates in the between subject and within subject variances model, as well as the significance of the random scale variance determines the heterogeneity across subjects. \u0000Conclusion: Bayesian mixed effects location scale model provides a useful tool for analysing correlated binary data with heteroscedasticity because it considers data correlation and modelling mean and variance simultaneously. Furthermore, it improves the accuracy of statistical inference in longitudinal studies compared to classic mixed effects models.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"11 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139452513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gideon Addo, P. Ossei, Bismark Amponsah Yeboah, W. Ayibor, Raphael Doh-Nani, Seidu Mohammed, Michael Obuobi, Roselyn Assor Appau
{"title":"Determinants of Hospital Stay Duration Post-Colorectal Surgery","authors":"Gideon Addo, P. Ossei, Bismark Amponsah Yeboah, W. Ayibor, Raphael Doh-Nani, Seidu Mohammed, Michael Obuobi, Roselyn Assor Appau","doi":"10.18502/jbe.v9i2.14627","DOIUrl":"https://doi.org/10.18502/jbe.v9i2.14627","url":null,"abstract":"Introduction: Hospital length of stay (LOS) remains a vital metric for assessing patient outcomes and healthcare resource utilization. Given the substantial financial impact of diagnosing and treating colorectal anomalies, coupled with an increased susceptibility to postoperative complications, it is crucial to understand the factors affecting LOS following colorectal surgery. Our primary objective was to investigate the preoperative, intraoperative, and postoperative risk factors that have substantial influence over LOS following a colorectal procedure. \u0000Methods: This study analyzed data from a retrospective study of adults who underwent various colorectal surgeries (colostomy, ileostomy, small bowel resection, etc.) at Cleveland Clinic Foundation (January 2005 \u0000- December 2014). Predictor variables were categorized into preoperative (patient demographics, medical history, comorbidities, lifestyle factors), intraoperative, and postoperative factors. LOS was grouped into short-term (SLOS) (≤ 7 days), medium-term (MLOS) (8-30 days), and long-term (LLOS) (> 30 days) stays. Multinomial logistic regression models assessed predictor effects on LOS. \u0000Results: Among the 7874 patients, 50.7% were females, with a minimum age of 20 years. SLOS were observed in 61.1%, MLOS in 37.6%, and LLOS in 1.3% of patients. Advanced age correlated with prolonged LOS, possibly due to age-related health challenges like weak immune systems. Coagulopathy, and fluid and electrolyte disorders raised MLOS and LLOS risk, likely due to complications like significant bleeding and electrolyte imbalances. Surgery duration predicted longer LOS, elevating LLOS and MLOS by 52% and 42%. Postoperative infections were associated to extended stays, possibly due to subsequent interventions, monitoring and recovery delays. \u0000Conclusion: Our study revealed that key preoperative predictors of LOS included Age, coagulopathy, fluid and electrolyte disorders, severe weight loss, and drug abuse. Notably, intraoperative factors such as surgical approach (open vs laparoscopic) and surgery duration, alongside postoperative complications including superficial and serious infections, significantly influenced LOS. By incorporating these insights into the preoperative planning, clinicians could potentially develop tailored interventions to mitigate risk factors and enhance postoperative recovery, thus potentially reducing LOS and improving patient outcomes.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"102 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139391431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of HIV Prevalence among the Female Population in South India: A Bayesian Approach","authors":"Elangovan Arumugum, Vasna Joshua","doi":"10.18502/jbe.v9i2.14624","DOIUrl":"https://doi.org/10.18502/jbe.v9i2.14624","url":null,"abstract":"Introduction: The HIV Sentinel Surveillance (HSS) conducted by National AIDS Control Organization (NACO) is the predominant data source for HIV estimations in India. While the HSS targets the key populations at risk of HIV infection, the National Family Health Survey (NFHS) measures the community- based HIV prevalence. Improvised HIV estimates in India were attributed to the HIV prevalence data obtained from the NACO-HSS and NFHS. \u0000Methods: Bayesian analysis was performed to determine the state-level prevalence of HIV among females in seven South Indian States. The analysis involved plotting the prior, likelihood, and posterior distributions, facilitating a visual assessment of the data. The HIV prevalence among females calculated from the NFHS (2015-16) survey data was used for prior distributions. HIV prevalence among pregnant women obtained from the HIV Sentinel Surveillance 2019 was used for likelihood. Bayesian analysis was performed using the R programming (RStudio 2022.02.0). A posterior probability distribution was obtained using the prior distribution and the likelihood by applying the Bayes theorem. Graphical representation was achieved through R's plotting functions. Kerala and Pondicherry were not included in the analysis due to zero or very low prevalence reported in both NFHS and HSS. \u0000Results: The Bayesian estimates of HIV prevalence among females were 0.38 % [95% CI:0.29 - 0.47] in Andhra Pradesh, 0.28 [95% CI:0.23 - 0.35] in Karnataka, 0.27 [95% CI:0.20 - 0.34] Odisha, 0.27 % [95% CI:0.19 - 0.36] in Telangana and 0.19 [95% CI:0.15 - 0.24] in Tamil Nadu. \u0000Conclusion: Bayesian techniques present a versatile and robust strategy for modelling and analysing HIV- related data, offering a flexible and powerful approach to data analysis.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"84 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139390502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Common Study Designs of Nutrition Clinical Trials: Review of the Basic Elements and the Pros and Cons","authors":"P. Mirmiran, H. Malmir, Z. Bahadoran","doi":"10.18502/jbe.v9i2.14623","DOIUrl":"https://doi.org/10.18502/jbe.v9i2.14623","url":null,"abstract":"Introduction: Nutrition Clinical Trials (NCTs) are pivotal in establishing causal links between nutritional interventions and chronic diseases. This review comprehensively examines prevalent clinical trial designs, emphasizing their strengths and limitations. The goal is to provide insights into the selection and optimization of these designs for dietary intervention studies. \u0000Methods: Various study designs in NCTs are explored, including quasi-experimental designs, double-blind randomized placebo-controlled trials for nutrient/functional foods supplementation, community-based lifestyle interventions, pragmatic nutrition interventions, and field trial projects. The characteristics, advantages, and challenges of each design are discussed. Real examples are presented to illustrate how these designs can be tailored and optimized for dietary intervention studies. \u0000Results: Parallel randomized clinical trials are acknowledged as the gold standard, despite requiring substantial sample sizes and having inherent limitations. Cross-over NCTs emerge as valuable for assessing temporary treatment effects while mitigating potential confounders and interpatient variability. However, they may not be suitable for acute diseases and progressive disorders, and attrition rates can be higher. Multi-arm randomized designs offer increased study power with a lower sample size but necessitate more intricate design, analysis, and result reporting. \u0000Conclusion: In conclusion, each study design in NCTs comes with its set of strengths and limitations. The selection of an appropriate design should consider determinants and common considerations to provide robust evidence for establishing cause-and-effect associations or assessing the safety and efficacy of food products in nutrition research. This comprehensive understanding aids researchers in making informed choices when planning and conducting nutrition clinical trials.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"114 48","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139390883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmad Reza Baghestani, Farid Zayeri, Mojtaba Meshkat
{"title":"The Geometric Generalized Birnbaum–Saunders model with long-Term Survivors","authors":"Ahmad Reza Baghestani, Farid Zayeri, Mojtaba Meshkat","doi":"10.18502/jbe.v9i1.13976","DOIUrl":"https://doi.org/10.18502/jbe.v9i1.13976","url":null,"abstract":"Introduction: A cure rate survival model was developed based on the assumption that the number of competing reasons for the event of interest has the Geometric distribution and the time allocated to the event of interest follows the Generalized Birnbaum-Saunders distribution.
 Methods: The Geometric Generalized Birnbaum–Saunders distribution was defined and two useful representations were represented for its density function which contributes to the creation of some mathematical properties. Furthermore, the parameters of the model with cure rate were estimated by using the maximum likelihood method.
 Results: Several simulations were performed and a real data set was analyzed from the medical area for different sample sizes and censoring percentages.In the melanoma data set and regarding the AIC and SBC selection criteria, the Geometric Generalized Birnbaum–Saunders distribution model was preferred and was selected as the appropriate model in the present study.
 Conclusion: Geometric Generalized Birnbaum–Saunders distribution is a highly flexible lifetime model which allows for different degrees of Kurtosis and asymmetry.by considering the advantages of the Geometric Generalized Birnbaum–Saunders distribution model, the model can be implemented as an appropriate alternative to explain or predict the survival time for long-term individuals.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"2008 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135813943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zainab M. Al-Balushi, Amadou Sarr, M Mazharul Islam
{"title":"Beta-Geometric Regression for Modeling Count Data on First Antenatal Care Visit (ANC) with Application","authors":"Zainab M. Al-Balushi, Amadou Sarr, M Mazharul Islam","doi":"10.18502/jbe.v9i1.13977","DOIUrl":"https://doi.org/10.18502/jbe.v9i1.13977","url":null,"abstract":"Introduction: Little attention has been paid to modeling count data with the geometric distribution. There are many real-life phenomena with a constant probability of first success. However, in practice, the probability of the first success may vary, making simple geometric models unsuitable for modeling such data. One can assume one of many continuous distributions for modeling the probability of first success with the parameter space [0, 1]. In this respect Beta distribution defined on the standard unit interval [0,1] is the most useful distribution due to its ability to accommodate a wide range of shapes. Thus, in this paper, by mixing Beta and geometric distribution, we developed a Beta-geometric distribution for modeling the count data through application to real-life count data on time to the first antenatal care (ANC) visit.
 Methods: The estimation of the distribution parameters using the method of moments, maximum likelihood estimation (MLE) method, and Bayesian estimation approach are provided. Based on the Beta-geometric distribution, we developed a new Beta-geometric regression model for analyzing count data that follow the geometric distribution. The goodness of fit of the derived model has been tested using real data on time to the first ANC visit.
 Results: Beta-geometric distribution has a simple form for its probability mass function (pmf), and is flexible in capturing both underdispersion and overdispersion that may present in count data. It was found that the proposed Beta-geometric regression model fit the count data on the first ANC visit better than simple geometric distribution or Negative Binomial distribution.
 Conclusion: Unlike the Poisson or Negative Binomial distribution, Beta-geometric distribution does not need an additional parameter to accommodate underdispersion or overdispersion and thus could be a flexible choice for analyzing any count data. The goodness of fit test of the Beta-geometric model provides better fitting of the model to real data on time to first ANC visit than geometric or Negative binomial models.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"2012 35","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135814114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehrdad Bagherpour-kalo, Parvaneh Darabi, Ali Moghadas Jafari, Hamid Najafimehr, Kamal Azam, Mostafa Hosseini
{"title":"Prevalence of Restless Legs Syndrome in Rheumatoid Arthritis: A Systematic Review and Meta-Analysis","authors":"Mehrdad Bagherpour-kalo, Parvaneh Darabi, Ali Moghadas Jafari, Hamid Najafimehr, Kamal Azam, Mostafa Hosseini","doi":"10.18502/jbe.v9i1.13971","DOIUrl":"https://doi.org/10.18502/jbe.v9i1.13971","url":null,"abstract":"Introduction: Restless legs syndrome (RLS) is a common sensorimotor sleep disorder, and rheumatoid arthritis (RA) is an inflammatory autoimmune disease that causes disability. Previous studies showed that the prevalence of RLS varies in different populations of RA (13.2 – 68.4%). It raises the need for a pooled metaanalysis to determine a more reliable estimate. Therefore, we aimed to perform a meta-analysis to assess the pooled prevalence of RLS in RA patients.
 Methods: Meta-analysis was performed according to the PRISMA checklist. Embase, MEDLINE, Ovid, Web-of-Science, and Scopus databases were used for the systematic search, and eligible studies were analyzed using R version 4.0.3. For further review, we performed sensitivity analyzes to identify influential studies.
 Results: Of a total of 763 studies, 11 studies (3 were from Europe, 4 from North America, and 4 from Asia) were suitable for synthesis. A total of 931 RA patients were identified, 300 of whom had symptoms of RLS. The pooled prevalence of RLS among people with RA from 11 studies was 34% (95% CI: 26-43%). The pooled prevalence of RLS in Europe, Asia, and North America was 48% (95% CI: 32-65%), 32% (95% CI: 18-45%), and 28% (95% CI: 15-42%), respectively. RLS prevalence was dramatically high in RA women patients (32% CI: 23-41%) than RA men patients (3%; 95% CI: 2-5%).
 Conclusion: This systematic review and meta-analysis indicates that the risk of RLS in RA patients was 34% and female patients with RA were more prone to having RLS than male patients.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"392 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135871983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Random-Splitting Random Forest with Multiple Mixed-Data Covariates","authors":"Mohammad Fayaz, Alireza Abadi, Soheila Khodakarim","doi":"10.18502/jbe.v9i1.13974","DOIUrl":"https://doi.org/10.18502/jbe.v9i1.13974","url":null,"abstract":"Introduction:The bagging (BG) and random forest (RF) are famous supervised statistical learning methods based on the classification and regression trees. The BG and RF can deal with different types of responses such as categorical, continuous, etc. There are curves, time series, functional data, or observations that are related to each other based on their domain in many statistical applications. The RF methods are extended to some cases for functional data as covariates or responses in many pieces of literature. Among them, random-splitting is used to summarize the functional data to the multiple related summary statistics such as average, etc.
 Methods: This research article extends this method and introduces the mixed data BG (MD-BG) and RF (MD-RF) algorithm for multiple functional and non-functional, or mixed and hybrid data, covariates and it calculates the variable importance plot (VIP) for each covariate.
 Results: The main differences between MD-BG and MD-RF are in choosing the covariates that in the first, all covariates remain in the model but the second uses a random sample of covariates. The MD-RF helps to unmask the most important parts of functional covariates and the most important non-functional covariates.
 Conclusion: We apply our methods on the two datasets of DTI and Tecator and compare their performances for continuous and categorical responses with developed R package (“RSRF”) in the GitHub.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"2015 29","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135813141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}