Interpretable multi-morphology and multi-scale microalgae classification based on machine learning

IF 4.6 2区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Huchao Yan , Xinggan Peng , Chao Wang , Ao Xia , Yun Huang , Xianqing Zhu , Jingmiao Zhang , Xun Zhu , Qiang Liao
{"title":"Interpretable multi-morphology and multi-scale microalgae classification based on machine learning","authors":"Huchao Yan ,&nbsp;Xinggan Peng ,&nbsp;Chao Wang ,&nbsp;Ao Xia ,&nbsp;Yun Huang ,&nbsp;Xianqing Zhu ,&nbsp;Jingmiao Zhang ,&nbsp;Xun Zhu ,&nbsp;Qiang Liao","doi":"10.1016/j.algal.2024.103812","DOIUrl":null,"url":null,"abstract":"<div><div>The multi-morphology and multi-scale mixed microalgae are widely distributed in natural and artificial systems. There is an urgent need to develop an efficient approach to classify the mixed microalgae for natural water system monitoring and microalgae bioprocesses, such as wastewater treatment, carbon dioxide capture and prevention of harmful algal blooms. The numerical feature datasets of pure and mixed cultures of multi-morphic microalgae with a size range between 5 and 500 μm are established in the study. A large number of input features increases model complexity and computational costs, and the feature space dimension was reduced from 24 dimensions to 11 dimensions using the Pearson coefficient matrix and principal component analysis to reduce the impact of unimportant factors. Research indicates that the classification performance of the ensemble model is significantly better than that of the linear and nonlinear models. The average F1_score of the random forest optimized by grid search classified pure and mixed microalgae are 0.952 and 0.943, respectively, which are 2.2 % and 1.0 % higher than those without optimization. The Shapley Additive exPlanations theory and the ensemble model are combined to analyze the critical factors for microalgae classification, and the texture features play a crucial role in all the numerical features of microalgae images.</div></div>","PeriodicalId":7855,"journal":{"name":"Algal Research-Biomass Biofuels and Bioproducts","volume":"84 ","pages":"Article 103812"},"PeriodicalIF":4.6000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algal Research-Biomass Biofuels and Bioproducts","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211926424004247","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The multi-morphology and multi-scale mixed microalgae are widely distributed in natural and artificial systems. There is an urgent need to develop an efficient approach to classify the mixed microalgae for natural water system monitoring and microalgae bioprocesses, such as wastewater treatment, carbon dioxide capture and prevention of harmful algal blooms. The numerical feature datasets of pure and mixed cultures of multi-morphic microalgae with a size range between 5 and 500 μm are established in the study. A large number of input features increases model complexity and computational costs, and the feature space dimension was reduced from 24 dimensions to 11 dimensions using the Pearson coefficient matrix and principal component analysis to reduce the impact of unimportant factors. Research indicates that the classification performance of the ensemble model is significantly better than that of the linear and nonlinear models. The average F1_score of the random forest optimized by grid search classified pure and mixed microalgae are 0.952 and 0.943, respectively, which are 2.2 % and 1.0 % higher than those without optimization. The Shapley Additive exPlanations theory and the ensemble model are combined to analyze the critical factors for microalgae classification, and the texture features play a crucial role in all the numerical features of microalgae images.

Abstract Image

基于机器学习的可解释多形态和多尺度微藻分类
多形态、多尺度混合微藻广泛分布于自然系统和人工系统中。目前迫切需要开发一种有效的混合微藻分类方法,用于天然水系统监测和微藻生物处理,如废水处理、二氧化碳捕获和有害藻华的预防。建立了5 ~ 500 μm大小的多形态微藻纯培养物和混合培养物的数值特征数据集。大量的输入特征增加了模型复杂度和计算成本,利用Pearson系数矩阵和主成分分析将特征空间维数从24维降至11维,以减少不重要因素的影响。研究表明,集成模型的分类性能明显优于线性和非线性模型。经过网格搜索优化的纯微藻和混合微藻分类随机森林的平均F1_score分别为0.952和0.943,分别比未优化的随机森林高2.2%和1.0%。结合Shapley加性解释理论和集合模型分析了微藻分类的关键因素,纹理特征在微藻图像的所有数值特征中起着至关重要的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Algal Research-Biomass Biofuels and Bioproducts
Algal Research-Biomass Biofuels and Bioproducts BIOTECHNOLOGY & APPLIED MICROBIOLOGY-
CiteScore
9.40
自引率
7.80%
发文量
332
期刊介绍: Algal Research is an international phycology journal covering all areas of emerging technologies in algae biology, biomass production, cultivation, harvesting, extraction, bioproducts, biorefinery, engineering, and econometrics. Algae is defined to include cyanobacteria, microalgae, and protists and symbionts of interest in biotechnology. The journal publishes original research and reviews for the following scope: algal biology, including but not exclusive to: phylogeny, biodiversity, molecular traits, metabolic regulation, and genetic engineering, algal cultivation, e.g. phototrophic systems, heterotrophic systems, and mixotrophic systems, algal harvesting and extraction systems, biotechnology to convert algal biomass and components into biofuels and bioproducts, e.g., nutraceuticals, pharmaceuticals, animal feed, plastics, etc. algal products and their economic assessment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信