Numeric Prediction of Dissolved Oxygen Status Through Two-Stage Training for Classification-Driven Regression

Pengfei Guo, Han Liu, Shuangyin Liu, Longqin Xu
{"title":"Numeric Prediction of Dissolved Oxygen Status Through Two-Stage Training for Classification-Driven Regression","authors":"Pengfei Guo, Han Liu, Shuangyin Liu, Longqin Xu","doi":"10.1109/ICMLC48188.2019.8949196","DOIUrl":null,"url":null,"abstract":"Dissolved oxygen of aquaculture is an important measure of the quality of culture environment and how aquatic products have been grown. In the machine learning context, the above measure can be achieved by defining a regression problem, which aims at numerical prediction of the dissolved oxygen status. In general, the vast majority of popular machine learning algorithms were designed for undertaking classification tasks. In order to effectively adopt the popular machine learning algorithms for the above-mentioned numerical prediction, in this paper, we propose a two-stage training approach that involves transforming a regression problem into a classification problem and then transforming it back to regression problem. In particular, unsupervised discretization of continuous attributes is adopted at the first stage to transform the target (numeric) attribute into a discrete (nominal) one with several intervals, such that popular machine learning algorithms can be used to predict the interval to which an instance belongs in the setting of a classification task. Furthermore, based on the classification result at the first stage, some of the instances within the predicted interval are selected for training at the second stage towards numerical prediction of the target attribute value of each instance. An experimental study is conducted to investigate in general the effectiveness of the popular learning algorithms in the numerical prediction task and also analyze how the increase of the number of training instances (selected at the second training stage) can impact on the final prediction performance. The results show that the adoption of decision tree learning and neural networks lead to better and more stable performance than Naive Bayes, K Nearest Neighbours and Support Vector Machine.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC48188.2019.8949196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Dissolved oxygen of aquaculture is an important measure of the quality of culture environment and how aquatic products have been grown. In the machine learning context, the above measure can be achieved by defining a regression problem, which aims at numerical prediction of the dissolved oxygen status. In general, the vast majority of popular machine learning algorithms were designed for undertaking classification tasks. In order to effectively adopt the popular machine learning algorithms for the above-mentioned numerical prediction, in this paper, we propose a two-stage training approach that involves transforming a regression problem into a classification problem and then transforming it back to regression problem. In particular, unsupervised discretization of continuous attributes is adopted at the first stage to transform the target (numeric) attribute into a discrete (nominal) one with several intervals, such that popular machine learning algorithms can be used to predict the interval to which an instance belongs in the setting of a classification task. Furthermore, based on the classification result at the first stage, some of the instances within the predicted interval are selected for training at the second stage towards numerical prediction of the target attribute value of each instance. An experimental study is conducted to investigate in general the effectiveness of the popular learning algorithms in the numerical prediction task and also analyze how the increase of the number of training instances (selected at the second training stage) can impact on the final prediction performance. The results show that the adoption of decision tree learning and neural networks lead to better and more stable performance than Naive Bayes, K Nearest Neighbours and Support Vector Machine.
基于分类驱动回归的两阶段训练的溶解氧状态数值预测
水产养殖溶解氧是衡量养殖环境质量和水产品生长状况的重要指标。在机器学习上下文中,上述措施可以通过定义一个回归问题来实现,该问题旨在对溶解氧状态进行数值预测。一般来说,绝大多数流行的机器学习算法都是为执行分类任务而设计的。为了有效地采用流行的机器学习算法进行上述数值预测,在本文中,我们提出了一种两阶段训练方法,将回归问题转化为分类问题,然后再将其转化为回归问题。特别是,在第一阶段采用连续属性的无监督离散化,将目标(数值)属性转换为具有多个区间的离散(标称)属性,从而可以使用流行的机器学习算法来预测分类任务设置中实例所属的区间。在第一阶段分类结果的基础上,选择预测区间内的部分实例进行第二阶段的训练,对每个实例的目标属性值进行数值预测。通过实验研究,研究了目前流行的学习算法在数值预测任务中的有效性,并分析了在第二训练阶段选择的训练实例数量的增加对最终预测性能的影响。结果表明,决策树学习和神经网络的采用比朴素贝叶斯、K近邻和支持向量机的性能更好、更稳定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信