{"title":"Multimodal distribution and its impact on the accurate assessment of spermatozoa morphological data: Lessons from machine learning","authors":"D. Stefanovski , M. Schulze , G.C. Althouse","doi":"10.1016/j.anireprosci.2024.107564","DOIUrl":null,"url":null,"abstract":"<div><div><span><span><span>Objective assessment of sperm morphology is an essential component for assessing ejaculate quality. Due to economic limitations, investigators often divert to conducting observational studies instead of experimental ones, which provide the strongest statistical power, yielding more heterogeneous data regardless of the number of </span>data sources (barns/farms). Using such data inevitably leads to higher variances of estimates, which negatively impacts the statistical power of a study. In this article, we describe a statistical methodology called finite mixture modeling (FMM), which, based on the supplied data and assumed number of sub-classes, classifies the data into two or more homogeneous types of distributions and determines their fractional size relative to the entire cohort. The goal is to use statistical methods that will confound the variance of the sample. A figure from a previous publication was used to generate </span>simulated data (n=1559) on the cytoplasmic droplet rate. We identified that a bi-modal distribution with two latent classes best described the simulated data. </span><em>Post-hoc</em><span> estimation showed that about 80 % of observations belonged to latent class 1, with 20 % in latent class 2. The FMM methodology identified a cutoff point of 8.7 %. Finally, when estimating the standard error for the total cohort, the FMM methodology yielded a 40 % reduction in the standard error compared to standard methodologies. In conclusion, here we show that FMM successfully confounded the variance of the data and, as such, yielded lower estimates of the variance than standard methodologies, increasing the statistical power of the cohort.</span></div></div>","PeriodicalId":7880,"journal":{"name":"Animal Reproduction Science","volume":"269 ","pages":"Article 107564"},"PeriodicalIF":2.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animal Reproduction Science","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378432024001556","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Objective assessment of sperm morphology is an essential component for assessing ejaculate quality. Due to economic limitations, investigators often divert to conducting observational studies instead of experimental ones, which provide the strongest statistical power, yielding more heterogeneous data regardless of the number of data sources (barns/farms). Using such data inevitably leads to higher variances of estimates, which negatively impacts the statistical power of a study. In this article, we describe a statistical methodology called finite mixture modeling (FMM), which, based on the supplied data and assumed number of sub-classes, classifies the data into two or more homogeneous types of distributions and determines their fractional size relative to the entire cohort. The goal is to use statistical methods that will confound the variance of the sample. A figure from a previous publication was used to generate simulated data (n=1559) on the cytoplasmic droplet rate. We identified that a bi-modal distribution with two latent classes best described the simulated data. Post-hoc estimation showed that about 80 % of observations belonged to latent class 1, with 20 % in latent class 2. The FMM methodology identified a cutoff point of 8.7 %. Finally, when estimating the standard error for the total cohort, the FMM methodology yielded a 40 % reduction in the standard error compared to standard methodologies. In conclusion, here we show that FMM successfully confounded the variance of the data and, as such, yielded lower estimates of the variance than standard methodologies, increasing the statistical power of the cohort.
期刊介绍:
Animal Reproduction Science publishes results from studies relating to reproduction and fertility in animals. This includes both fundamental research and applied studies, including management practices that increase our understanding of the biology and manipulation of reproduction. Manuscripts should go into depth in the mechanisms involved in the research reported, rather than a give a mere description of findings. The focus is on animals that are useful to humans including food- and fibre-producing; companion/recreational; captive; and endangered species including zoo animals, but excluding laboratory animals unless the results of the study provide new information that impacts the basic understanding of the biology or manipulation of reproduction.
The journal''s scope includes the study of reproductive physiology and endocrinology, reproductive cycles, natural and artificial control of reproduction, preservation and use of gametes and embryos, pregnancy and parturition, infertility and sterility, diagnostic and therapeutic techniques.
The Editorial Board of Animal Reproduction Science has decided not to publish papers in which there is an exclusive examination of the in vitro development of oocytes and embryos; however, there will be consideration of papers that include in vitro studies where the source of the oocytes and/or development of the embryos beyond the blastocyst stage is part of the experimental design.