Voon Hueh Goh, Muhammad Asraf Mansor, M. A. As’ari, L. Ismail
{"title":"Multimodal Convolutional Neural Networks for Sperm Motility and Concentration Predictions","authors":"Voon Hueh Goh, Muhammad Asraf Mansor, M. A. As’ari, L. Ismail","doi":"10.11113/mjfas.v20n2.3263","DOIUrl":null,"url":null,"abstract":"Semen analysis is an important analysis for male infertility primary investigation and manual semen analysis is a conventional method to assess it. Manual semen analysis has been revealed with accuracy and precision limitations due to noncompliance to guidelines and procedures. Sperm motility and concentration are the main indicators for pregnancy and conception rate hence they were selected for parameters prediction. Convolutional neural network (CNN) has benefited computer vision application industry in recent years and has been widely applied in computer vision research tasks. In this paper, three-dimensional CNN (3DCNN) was designed to extract motion and temporal features, which are vital for sperm motility prediction. For sperm concentration, since two-dimensional CNN (2DCNN) is efficient in recognizing and extracting spatial features, well-established Residual Network (ResNet) architecture was adopted and customized for sperm concentration prediction. Multimodal learning approach is a technique to aggregate learnt features from different deep learning architecture that adopted other forms of modalities, which could provide deep learning model with better insights on their tasks. Hence, a multimodal learning deep learning architecture was designed to receive both image-based (frames extracted from video samples) and video-based (stacked frames pre-processed from video samples) input that could provide well-extracted spatial and temporal features for sperm parameters prediction. The results obtained using the proposed methodology have surpassed other similar research works who used deep learning approach. For sperm motility, its best achieved average mean absolute error (MAE) was 8.048, and sperm concentration obtained a competent Pearson’s correlation coefficient (RP) value of 0.853.","PeriodicalId":18149,"journal":{"name":"Malaysian Journal of Fundamental and Applied Sciences","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Fundamental and Applied Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/mjfas.v20n2.3263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Semen analysis is an important analysis for male infertility primary investigation and manual semen analysis is a conventional method to assess it. Manual semen analysis has been revealed with accuracy and precision limitations due to noncompliance to guidelines and procedures. Sperm motility and concentration are the main indicators for pregnancy and conception rate hence they were selected for parameters prediction. Convolutional neural network (CNN) has benefited computer vision application industry in recent years and has been widely applied in computer vision research tasks. In this paper, three-dimensional CNN (3DCNN) was designed to extract motion and temporal features, which are vital for sperm motility prediction. For sperm concentration, since two-dimensional CNN (2DCNN) is efficient in recognizing and extracting spatial features, well-established Residual Network (ResNet) architecture was adopted and customized for sperm concentration prediction. Multimodal learning approach is a technique to aggregate learnt features from different deep learning architecture that adopted other forms of modalities, which could provide deep learning model with better insights on their tasks. Hence, a multimodal learning deep learning architecture was designed to receive both image-based (frames extracted from video samples) and video-based (stacked frames pre-processed from video samples) input that could provide well-extracted spatial and temporal features for sperm parameters prediction. The results obtained using the proposed methodology have surpassed other similar research works who used deep learning approach. For sperm motility, its best achieved average mean absolute error (MAE) was 8.048, and sperm concentration obtained a competent Pearson’s correlation coefficient (RP) value of 0.853.