挖掘异质时间序列信息,预测海洋叶绿素累积量

IF 3.8 3区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Atharva Ramgirkar , Vadiraj Rao , Janhavi Talhar , Tusar Kanti Mishra , Swathi Jamjala Narayanan , Shashank Mouli Satapathy , Boominathan Perumal
{"title":"挖掘异质时间序列信息,预测海洋叶绿素累积量","authors":"Atharva Ramgirkar ,&nbsp;Vadiraj Rao ,&nbsp;Janhavi Talhar ,&nbsp;Tusar Kanti Mishra ,&nbsp;Swathi Jamjala Narayanan ,&nbsp;Shashank Mouli Satapathy ,&nbsp;Boominathan Perumal","doi":"10.1016/j.suscom.2024.100980","DOIUrl":null,"url":null,"abstract":"<div><p>Harmful algal blooms cause environmental harm, financial losses, and disease epidemics. It is also known that the algal blooms cannot be eradicated; hence the best option is to foresee their growth and regulate it. Machine learning algorithms can be used to forecast their presence and further classify the threat that each concentration level presents. In this research work, the dataset collected from Santa Monica, US region is analyzed and processed to predict algae concentration using machine learning algorithms. In this process, the machine learning models such as multiple linear regression, Regression Gradient Boosting Decision Tree (RGBDT), and Hidden Markov Model (HMM) are applied to predict the chlorophyll (Chl-a) content, which serves as a proxy for the presence of algae in the water. The obtained results show that for prediction, the Multilinear regression model outperforms the RGBDT (Regression Gradient Boosting Decision Tree) algorithm. Similarly, for modeling chlorophyll using HMM (Hidden Markov Model), parameter <em>bbp555.00_sd</em> is the best among parameters like <em>aot443.00_sd</em>, <em>kd490.00_sd</em>, <em>poc_sd</em> and <em>pic_sd</em>. The multiple linear regression model gave an adjusted R-squared error of 0.94 with the parameter pic_sd having the least VIF value of 1.78 followed by <em>aot</em> and <em>bbp</em> which have VIF<span><math><mo>&lt;</mo></math></span>5 (2.28 and 4.95 respectively). The outcome of the HMM-based model represents the probability of the presence of chlorophyll given the presence of each of the variables individually. From the results, it is observed that <em>bbp</em> has the highest probability of 0.405 implying that there is a 40% chance of chlorophyll in the presence of <em>bbp</em>.</p></div>","PeriodicalId":48686,"journal":{"name":"Sustainable Computing-Informatics & Systems","volume":"42 ","pages":"Article 100980"},"PeriodicalIF":3.8000,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mining of heterogeneous time series information for predicting chlorophyll accumulation in oceans\",\"authors\":\"Atharva Ramgirkar ,&nbsp;Vadiraj Rao ,&nbsp;Janhavi Talhar ,&nbsp;Tusar Kanti Mishra ,&nbsp;Swathi Jamjala Narayanan ,&nbsp;Shashank Mouli Satapathy ,&nbsp;Boominathan Perumal\",\"doi\":\"10.1016/j.suscom.2024.100980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Harmful algal blooms cause environmental harm, financial losses, and disease epidemics. It is also known that the algal blooms cannot be eradicated; hence the best option is to foresee their growth and regulate it. Machine learning algorithms can be used to forecast their presence and further classify the threat that each concentration level presents. In this research work, the dataset collected from Santa Monica, US region is analyzed and processed to predict algae concentration using machine learning algorithms. In this process, the machine learning models such as multiple linear regression, Regression Gradient Boosting Decision Tree (RGBDT), and Hidden Markov Model (HMM) are applied to predict the chlorophyll (Chl-a) content, which serves as a proxy for the presence of algae in the water. The obtained results show that for prediction, the Multilinear regression model outperforms the RGBDT (Regression Gradient Boosting Decision Tree) algorithm. Similarly, for modeling chlorophyll using HMM (Hidden Markov Model), parameter <em>bbp555.00_sd</em> is the best among parameters like <em>aot443.00_sd</em>, <em>kd490.00_sd</em>, <em>poc_sd</em> and <em>pic_sd</em>. The multiple linear regression model gave an adjusted R-squared error of 0.94 with the parameter pic_sd having the least VIF value of 1.78 followed by <em>aot</em> and <em>bbp</em> which have VIF<span><math><mo>&lt;</mo></math></span>5 (2.28 and 4.95 respectively). The outcome of the HMM-based model represents the probability of the presence of chlorophyll given the presence of each of the variables individually. From the results, it is observed that <em>bbp</em> has the highest probability of 0.405 implying that there is a 40% chance of chlorophyll in the presence of <em>bbp</em>.</p></div>\",\"PeriodicalId\":48686,\"journal\":{\"name\":\"Sustainable Computing-Informatics & Systems\",\"volume\":\"42 \",\"pages\":\"Article 100980\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sustainable Computing-Informatics & Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2210537924000258\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sustainable Computing-Informatics & Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210537924000258","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

有害藻华会造成环境危害、经济损失和疾病流行。众所周知,藻华是无法根除的,因此最好的办法就是预测藻华的生长并加以控制。机器学习算法可用于预测藻华的出现,并进一步对每种浓度水平带来的威胁进行分类。在这项研究工作中,我们对从美国圣莫尼卡地区收集的数据集进行了分析和处理,以利用机器学习算法预测藻类的浓度。在此过程中,应用了多元线性回归、回归梯度提升决策树(RGBDT)和隐马尔可夫模型(HMM)等机器学习模型来预测叶绿素(Chl-a)含量,叶绿素是水中是否存在藻类的代表。结果表明,在预测方面,多线性回归模型优于 RGBDT(回归梯度提升决策树)算法。同样,在使用 HMM(隐马尔可夫模型)对叶绿素进行建模时,参数 、 、 和 等参数是最好的。多元线性回归模型的调整 R 平方误差为 0.94,参数 pic_sd 的 VIF 值最小,为 1.78,其次是 和 ,它们的 VIF 值分别为 2.28 和 4.95。基于 HMM 模型的结果表明,在每个变量单独存在的情况下,叶绿素存在的概率。从结果中可以看出,......的概率最高,为 0.405,这意味着在......存在的情况下,叶绿素存在的概率为 40%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Mining of heterogeneous time series information for predicting chlorophyll accumulation in oceans

Mining of heterogeneous time series information for predicting chlorophyll accumulation in oceans

Harmful algal blooms cause environmental harm, financial losses, and disease epidemics. It is also known that the algal blooms cannot be eradicated; hence the best option is to foresee their growth and regulate it. Machine learning algorithms can be used to forecast their presence and further classify the threat that each concentration level presents. In this research work, the dataset collected from Santa Monica, US region is analyzed and processed to predict algae concentration using machine learning algorithms. In this process, the machine learning models such as multiple linear regression, Regression Gradient Boosting Decision Tree (RGBDT), and Hidden Markov Model (HMM) are applied to predict the chlorophyll (Chl-a) content, which serves as a proxy for the presence of algae in the water. The obtained results show that for prediction, the Multilinear regression model outperforms the RGBDT (Regression Gradient Boosting Decision Tree) algorithm. Similarly, for modeling chlorophyll using HMM (Hidden Markov Model), parameter bbp555.00_sd is the best among parameters like aot443.00_sd, kd490.00_sd, poc_sd and pic_sd. The multiple linear regression model gave an adjusted R-squared error of 0.94 with the parameter pic_sd having the least VIF value of 1.78 followed by aot and bbp which have VIF<5 (2.28 and 4.95 respectively). The outcome of the HMM-based model represents the probability of the presence of chlorophyll given the presence of each of the variables individually. From the results, it is observed that bbp has the highest probability of 0.405 implying that there is a 40% chance of chlorophyll in the presence of bbp.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Sustainable Computing-Informatics & Systems
Sustainable Computing-Informatics & Systems COMPUTER SCIENCE, HARDWARE & ARCHITECTUREC-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
10.70
自引率
4.40%
发文量
142
期刊介绍: Sustainable computing is a rapidly expanding research area spanning the fields of computer science and engineering, electrical engineering as well as other engineering disciplines. The aim of Sustainable Computing: Informatics and Systems (SUSCOM) is to publish the myriad research findings related to energy-aware and thermal-aware management of computing resource. Equally important is a spectrum of related research issues such as applications of computing that can have ecological and societal impacts. SUSCOM publishes original and timely research papers and survey articles in current areas of power, energy, temperature, and environment related research areas of current importance to readers. SUSCOM has an editorial board comprising prominent researchers from around the world and selects competitively evaluated peer-reviewed papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信