Numeric Prediction of Dissolved Oxygen Status Through Two-Stage Training for Classification-Driven Regression

2019 International Conference on Machine Learning and Cybernetics (ICMLC) Pub Date : 2019-07-01 DOI:10.1109/ICMLC48188.2019.8949196

Pengfei Guo, Han Liu, Shuangyin Liu, Longqin Xu

{"title":"Numeric Prediction of Dissolved Oxygen Status Through Two-Stage Training for Classification-Driven Regression","authors":"Pengfei Guo, Han Liu, Shuangyin Liu, Longqin Xu","doi":"10.1109/ICMLC48188.2019.8949196","DOIUrl":null,"url":null,"abstract":"Dissolved oxygen of aquaculture is an important measure of the quality of culture environment and how aquatic products have been grown. In the machine learning context, the above measure can be achieved by defining a regression problem, which aims at numerical prediction of the dissolved oxygen status. In general, the vast majority of popular machine learning algorithms were designed for undertaking classification tasks. In order to effectively adopt the popular machine learning algorithms for the above-mentioned numerical prediction, in this paper, we propose a two-stage training approach that involves transforming a regression problem into a classification problem and then transforming it back to regression problem. In particular, unsupervised discretization of continuous attributes is adopted at the first stage to transform the target (numeric) attribute into a discrete (nominal) one with several intervals, such that popular machine learning algorithms can be used to predict the interval to which an instance belongs in the setting of a classification task. Furthermore, based on the classification result at the first stage, some of the instances within the predicted interval are selected for training at the second stage towards numerical prediction of the target attribute value of each instance. An experimental study is conducted to investigate in general the effectiveness of the popular learning algorithms in the numerical prediction task and also analyze how the increase of the number of training instances (selected at the second training stage) can impact on the final prediction performance. The results show that the adoption of decision tree learning and neural networks lead to better and more stable performance than Naive Bayes, K Nearest Neighbours and Support Vector Machine.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC48188.2019.8949196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Dissolved oxygen of aquaculture is an important measure of the quality of culture environment and how aquatic products have been grown. In the machine learning context, the above measure can be achieved by defining a regression problem, which aims at numerical prediction of the dissolved oxygen status. In general, the vast majority of popular machine learning algorithms were designed for undertaking classification tasks. In order to effectively adopt the popular machine learning algorithms for the above-mentioned numerical prediction, in this paper, we propose a two-stage training approach that involves transforming a regression problem into a classification problem and then transforming it back to regression problem. In particular, unsupervised discretization of continuous attributes is adopted at the first stage to transform the target (numeric) attribute into a discrete (nominal) one with several intervals, such that popular machine learning algorithms can be used to predict the interval to which an instance belongs in the setting of a classification task. Furthermore, based on the classification result at the first stage, some of the instances within the predicted interval are selected for training at the second stage towards numerical prediction of the target attribute value of each instance. An experimental study is conducted to investigate in general the effectiveness of the popular learning algorithms in the numerical prediction task and also analyze how the increase of the number of training instances (selected at the second training stage) can impact on the final prediction performance. The results show that the adoption of decision tree learning and neural networks lead to better and more stable performance than Naive Bayes, K Nearest Neighbours and Support Vector Machine.

查看原文本刊更多论文

基于分类驱动回归的两阶段训练的溶解氧状态数值预测

水产养殖溶解氧是衡量养殖环境质量和水产品生长状况的重要指标。在机器学习上下文中，上述措施可以通过定义一个回归问题来实现，该问题旨在对溶解氧状态进行数值预测。一般来说，绝大多数流行的机器学习算法都是为执行分类任务而设计的。为了有效地采用流行的机器学习算法进行上述数值预测，在本文中，我们提出了一种两阶段训练方法，将回归问题转化为分类问题，然后再将其转化为回归问题。特别是，在第一阶段采用连续属性的无监督离散化，将目标(数值)属性转换为具有多个区间的离散(标称)属性，从而可以使用流行的机器学习算法来预测分类任务设置中实例所属的区间。在第一阶段分类结果的基础上，选择预测区间内的部分实例进行第二阶段的训练，对每个实例的目标属性值进行数值预测。通过实验研究，研究了目前流行的学习算法在数值预测任务中的有效性，并分析了在第二训练阶段选择的训练实例数量的增加对最终预测性能的影响。结果表明，决策树学习和神经网络的采用比朴素贝叶斯、K近邻和支持向量机的性能更好、更稳定。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Machine Learning and Cybernetics (ICMLC)

自引率

0.00%

发文量