随机森林回归在芒果产量中口径百分比预测中的应用

IF 7.4 Q1 AGRICULTURE, MULTIDISCIPLINARY
Bernard Roger Ramos Collin , Danilo de Lima Alves Xavier , Thiago Magalhães Amaral , Ana Cristina G. Castro Silva , Daniel dos Santos Costa , Fernanda Magalhães Amaral , Jefferson Tales Oliva
{"title":"随机森林回归在芒果产量中口径百分比预测中的应用","authors":"Bernard Roger Ramos Collin ,&nbsp;Danilo de Lima Alves Xavier ,&nbsp;Thiago Magalhães Amaral ,&nbsp;Ana Cristina G. Castro Silva ,&nbsp;Daniel dos Santos Costa ,&nbsp;Fernanda Magalhães Amaral ,&nbsp;Jefferson Tales Oliva","doi":"10.1016/j.inpa.2024.12.002","DOIUrl":null,"url":null,"abstract":"<div><div>The importance of identifying the caliber in advance is in knowing the exact quantity of mangos, by weight, that a determined crop season (complete periods of the mango cycle from growth up to fruit harvest) will provide. This study uses Random Forest method to predict the percentage distribution of the calibers of four mango varieties from Brazil’s largest exporter and producer. Our proposed approach was conducted in the following steps: data collection; data preprocessing; predictive model building; and model evaluation. The data correspond to three crop seasons, namely those of 2019, 2020, and 2021. Each data line corresponds to a plot with the percentage of a determined caliber at the end of a crop season. The number of rows in the dataset is 5503, with 37.33 %, 31.47 %, 22.76 %, and 8.44 % corresponding to the Keitt, Tommy Atkins, Kent, and Palmer varieties, respectively. The variables are Productivity, (N) Nitrogen, Number of plants (units), Plants/hectare, Month of floral induction, (Zn) Zinc, (S) Sulfur, (B) Boron, Caliber, and Percentage of caliber. The Python programming language was used to preprocess the data, do exploratory analysis, develop the algorithms of the Random Forest Regressor, and compile the lines of the code in Visual Studio Code. Python libraries were used during the study, such as pandas for data handling and Scipy for removing outliers to avoid any biases in the data. The YellowBrick library was used for the feature selection process. Four regression models were created using Random Forest (RF), one for each variety of fruit that composes the dataset. The algorithms showed satisfactory results for Kent, Keitt, Tommy Atkins, and Palmer mangoes, with the following R<sup>2</sup> of the models: 87.29 %, 74.37 %, 87.69 %, and 62.75 %, respectively. During the Feature Selection<!--> <!-->step, nitrogen (N) was perceived to be highly important in all the models, highlighting the representative nature of this element in fruit formation. From the models created, it is possible to predict the percentage distribution of the calibers of mangos from each growing area 6 months in advance, using data that characterize each area and information on the presence of leaf nutrients as input.</div></div>","PeriodicalId":53443,"journal":{"name":"Information Processing in Agriculture","volume":"12 3","pages":"Pages 370-383"},"PeriodicalIF":7.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Random forest regressor applied in prediction of percentages of calibers in mango production\",\"authors\":\"Bernard Roger Ramos Collin ,&nbsp;Danilo de Lima Alves Xavier ,&nbsp;Thiago Magalhães Amaral ,&nbsp;Ana Cristina G. Castro Silva ,&nbsp;Daniel dos Santos Costa ,&nbsp;Fernanda Magalhães Amaral ,&nbsp;Jefferson Tales Oliva\",\"doi\":\"10.1016/j.inpa.2024.12.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The importance of identifying the caliber in advance is in knowing the exact quantity of mangos, by weight, that a determined crop season (complete periods of the mango cycle from growth up to fruit harvest) will provide. This study uses Random Forest method to predict the percentage distribution of the calibers of four mango varieties from Brazil’s largest exporter and producer. Our proposed approach was conducted in the following steps: data collection; data preprocessing; predictive model building; and model evaluation. The data correspond to three crop seasons, namely those of 2019, 2020, and 2021. Each data line corresponds to a plot with the percentage of a determined caliber at the end of a crop season. The number of rows in the dataset is 5503, with 37.33 %, 31.47 %, 22.76 %, and 8.44 % corresponding to the Keitt, Tommy Atkins, Kent, and Palmer varieties, respectively. The variables are Productivity, (N) Nitrogen, Number of plants (units), Plants/hectare, Month of floral induction, (Zn) Zinc, (S) Sulfur, (B) Boron, Caliber, and Percentage of caliber. The Python programming language was used to preprocess the data, do exploratory analysis, develop the algorithms of the Random Forest Regressor, and compile the lines of the code in Visual Studio Code. Python libraries were used during the study, such as pandas for data handling and Scipy for removing outliers to avoid any biases in the data. The YellowBrick library was used for the feature selection process. Four regression models were created using Random Forest (RF), one for each variety of fruit that composes the dataset. The algorithms showed satisfactory results for Kent, Keitt, Tommy Atkins, and Palmer mangoes, with the following R<sup>2</sup> of the models: 87.29 %, 74.37 %, 87.69 %, and 62.75 %, respectively. During the Feature Selection<!--> <!-->step, nitrogen (N) was perceived to be highly important in all the models, highlighting the representative nature of this element in fruit formation. From the models created, it is possible to predict the percentage distribution of the calibers of mangos from each growing area 6 months in advance, using data that characterize each area and information on the presence of leaf nutrients as input.</div></div>\",\"PeriodicalId\":53443,\"journal\":{\"name\":\"Information Processing in Agriculture\",\"volume\":\"12 3\",\"pages\":\"Pages 370-383\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing in Agriculture\",\"FirstCategoryId\":\"1091\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214317324000854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing in Agriculture","FirstCategoryId":"1091","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214317324000854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

提前确定口径的重要性在于知道芒果的确切数量(按重量计算),这是一个确定的作物季节(芒果从生长到收获的完整周期)将提供的。本研究使用随机森林方法来预测来自巴西最大出口国和生产国的四种芒果品种的直径百分比分布。我们提出的方法分为以下几个步骤:数据收集;数据预处理;预测模型构建;以及模型评估。数据对应三个作物季节,即2019年、2020年和2021年。每条数据线对应一个地块,在作物季节结束时确定口径的百分比。数据集中的行数为5503,分别对应于Keitt、Tommy Atkins、Kent和Palmer品种的行数分别为37.33 %、31.47 %、22.76 %和8.44 %。变量为生产力、(N)氮、株数(单位)、株数/公顷、诱导花月、(Zn)锌、(S)硫、(B)硼、口径和口径百分比。使用Python编程语言对数据进行预处理,进行探索性分析,开发随机森林回归器的算法,并在Visual Studio code中编译代码行。在研究过程中使用了Python库,例如pandas用于数据处理,Scipy用于去除异常值以避免数据中的任何偏差。在特性选择过程中使用了YellowBrick库。使用随机森林(RF)创建了四个回归模型,每个模型对应组成数据集的水果品种。对于Kent, Keitt, Tommy Atkins和Palmer芒果,算法显示了令人满意的结果,模型的R2分别为87.29 %,74.37 %,87.69 %和62.75 %。在特征选择步骤中,氮(N)在所有模型中都被认为是非常重要的,突出了该元素在果实形成中的代表性。根据所创建的模型,可以提前6 个月预测每个种植区域芒果直径的百分比分布,使用每个区域的特征数据和叶片营养成分的存在信息作为输入。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Random forest regressor applied in prediction of percentages of calibers in mango production
The importance of identifying the caliber in advance is in knowing the exact quantity of mangos, by weight, that a determined crop season (complete periods of the mango cycle from growth up to fruit harvest) will provide. This study uses Random Forest method to predict the percentage distribution of the calibers of four mango varieties from Brazil’s largest exporter and producer. Our proposed approach was conducted in the following steps: data collection; data preprocessing; predictive model building; and model evaluation. The data correspond to three crop seasons, namely those of 2019, 2020, and 2021. Each data line corresponds to a plot with the percentage of a determined caliber at the end of a crop season. The number of rows in the dataset is 5503, with 37.33 %, 31.47 %, 22.76 %, and 8.44 % corresponding to the Keitt, Tommy Atkins, Kent, and Palmer varieties, respectively. The variables are Productivity, (N) Nitrogen, Number of plants (units), Plants/hectare, Month of floral induction, (Zn) Zinc, (S) Sulfur, (B) Boron, Caliber, and Percentage of caliber. The Python programming language was used to preprocess the data, do exploratory analysis, develop the algorithms of the Random Forest Regressor, and compile the lines of the code in Visual Studio Code. Python libraries were used during the study, such as pandas for data handling and Scipy for removing outliers to avoid any biases in the data. The YellowBrick library was used for the feature selection process. Four regression models were created using Random Forest (RF), one for each variety of fruit that composes the dataset. The algorithms showed satisfactory results for Kent, Keitt, Tommy Atkins, and Palmer mangoes, with the following R2 of the models: 87.29 %, 74.37 %, 87.69 %, and 62.75 %, respectively. During the Feature Selection step, nitrogen (N) was perceived to be highly important in all the models, highlighting the representative nature of this element in fruit formation. From the models created, it is possible to predict the percentage distribution of the calibers of mangos from each growing area 6 months in advance, using data that characterize each area and information on the presence of leaf nutrients as input.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Processing in Agriculture
Information Processing in Agriculture Agricultural and Biological Sciences-Animal Science and Zoology
CiteScore
21.10
自引率
0.00%
发文量
80
期刊介绍: Information Processing in Agriculture (IPA) was established in 2013 and it encourages the development towards a science and technology of information processing in agriculture, through the following aims: • Promote the use of knowledge and methods from the information processing technologies in the agriculture; • Illustrate the experiences and publications of the institutes, universities and government, and also the profitable technologies on agriculture; • Provide opportunities and platform for exchanging knowledge, strategies and experiences among the researchers in information processing worldwide; • Promote and encourage interactions among agriculture Scientists, Meteorologists, Biologists (Pathologists/Entomologists) with IT Professionals and other stakeholders to develop and implement methods, techniques, tools, and issues related to information processing technology in agriculture; • Create and promote expert groups for development of agro-meteorological databases, crop and livestock modelling and applications for development of crop performance based decision support system. Topics of interest include, but are not limited to: • Smart Sensor and Wireless Sensor Network • Remote Sensing • Simulation, Optimization, Modeling and Automatic Control • Decision Support Systems, Intelligent Systems and Artificial Intelligence • Computer Vision and Image Processing • Inspection and Traceability for Food Quality • Precision Agriculture and Intelligent Instrument • The Internet of Things and Cloud Computing • Big Data and Data Mining
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信