Huchao Yan , Xinggan Peng , Chao Wang , Ao Xia , Yun Huang , Xianqing Zhu , Jingmiao Zhang , Xun Zhu , Qiang Liao
{"title":"Interpretable multi-morphology and multi-scale microalgae classification based on machine learning","authors":"Huchao Yan , Xinggan Peng , Chao Wang , Ao Xia , Yun Huang , Xianqing Zhu , Jingmiao Zhang , Xun Zhu , Qiang Liao","doi":"10.1016/j.algal.2024.103812","DOIUrl":null,"url":null,"abstract":"<div><div>The multi-morphology and multi-scale mixed microalgae are widely distributed in natural and artificial systems. There is an urgent need to develop an efficient approach to classify the mixed microalgae for natural water system monitoring and microalgae bioprocesses, such as wastewater treatment, carbon dioxide capture and prevention of harmful algal blooms. The numerical feature datasets of pure and mixed cultures of multi-morphic microalgae with a size range between 5 and 500 μm are established in the study. A large number of input features increases model complexity and computational costs, and the feature space dimension was reduced from 24 dimensions to 11 dimensions using the Pearson coefficient matrix and principal component analysis to reduce the impact of unimportant factors. Research indicates that the classification performance of the ensemble model is significantly better than that of the linear and nonlinear models. The average F1_score of the random forest optimized by grid search classified pure and mixed microalgae are 0.952 and 0.943, respectively, which are 2.2 % and 1.0 % higher than those without optimization. The Shapley Additive exPlanations theory and the ensemble model are combined to analyze the critical factors for microalgae classification, and the texture features play a crucial role in all the numerical features of microalgae images.</div></div>","PeriodicalId":7855,"journal":{"name":"Algal Research-Biomass Biofuels and Bioproducts","volume":"84 ","pages":"Article 103812"},"PeriodicalIF":4.6000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algal Research-Biomass Biofuels and Bioproducts","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211926424004247","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The multi-morphology and multi-scale mixed microalgae are widely distributed in natural and artificial systems. There is an urgent need to develop an efficient approach to classify the mixed microalgae for natural water system monitoring and microalgae bioprocesses, such as wastewater treatment, carbon dioxide capture and prevention of harmful algal blooms. The numerical feature datasets of pure and mixed cultures of multi-morphic microalgae with a size range between 5 and 500 μm are established in the study. A large number of input features increases model complexity and computational costs, and the feature space dimension was reduced from 24 dimensions to 11 dimensions using the Pearson coefficient matrix and principal component analysis to reduce the impact of unimportant factors. Research indicates that the classification performance of the ensemble model is significantly better than that of the linear and nonlinear models. The average F1_score of the random forest optimized by grid search classified pure and mixed microalgae are 0.952 and 0.943, respectively, which are 2.2 % and 1.0 % higher than those without optimization. The Shapley Additive exPlanations theory and the ensemble model are combined to analyze the critical factors for microalgae classification, and the texture features play a crucial role in all the numerical features of microalgae images.
期刊介绍:
Algal Research is an international phycology journal covering all areas of emerging technologies in algae biology, biomass production, cultivation, harvesting, extraction, bioproducts, biorefinery, engineering, and econometrics. Algae is defined to include cyanobacteria, microalgae, and protists and symbionts of interest in biotechnology. The journal publishes original research and reviews for the following scope: algal biology, including but not exclusive to: phylogeny, biodiversity, molecular traits, metabolic regulation, and genetic engineering, algal cultivation, e.g. phototrophic systems, heterotrophic systems, and mixotrophic systems, algal harvesting and extraction systems, biotechnology to convert algal biomass and components into biofuels and bioproducts, e.g., nutraceuticals, pharmaceuticals, animal feed, plastics, etc. algal products and their economic assessment