Luciana D. Rocha, G. I. Gadotti, Ruan Bernardy, R. D. Pinheiro, R. D. C. M. Monteiro
{"title":"高粱种子批次排序的数据挖掘","authors":"Luciana D. Rocha, G. I. Gadotti, Ruan Bernardy, R. D. Pinheiro, R. D. C. M. Monteiro","doi":"10.1590/1983-21252023v36n224rc","DOIUrl":null,"url":null,"abstract":"ABSTRACT The ranking of seed lots is a fundamental process for all companies in the seed industry. This work aims to demonstrate data mining methods for ranking sorghum seed lots during the seed processing through analysis of quality control data. Germination and cold tests were performed to verify the physiological quality of the lots. Seed samples from each lot were evaluated in two moments: post-cleaning and finished product (ready for marketing). The results after pre-processing totaled 188 rows of data with six attributes, encompassing 150 lots accepted for marketing, 6 rejected, and 32 intermediate lots. The classifiers used were J48, Random Forest, Classification Via Regression, Naive Bayes, Multilayer Perceptron, and IBk. The Resample filter was used for adjustment of the data. The k-fold technique was used for training, with ten folds. The metrics of Accuracy, Precision, Recall, F-measure, and ROC Area were used to verify the accuracy of the algorithms. The results obtained were used to determine the best machine-learning algorithm. IBk and J48 presented the highest accuracy of data; the IBk technique presented the best results. The Resample filter was essential for solving the data imbalance problem. Sorghum seed lots can be classified with great accuracy and precision through artificial intelligence and machine learning technique.","PeriodicalId":21558,"journal":{"name":"Revista Caatinga","volume":" ","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data mining for ranking sorghum seed lots\",\"authors\":\"Luciana D. Rocha, G. I. Gadotti, Ruan Bernardy, R. D. Pinheiro, R. D. C. M. Monteiro\",\"doi\":\"10.1590/1983-21252023v36n224rc\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT The ranking of seed lots is a fundamental process for all companies in the seed industry. This work aims to demonstrate data mining methods for ranking sorghum seed lots during the seed processing through analysis of quality control data. Germination and cold tests were performed to verify the physiological quality of the lots. Seed samples from each lot were evaluated in two moments: post-cleaning and finished product (ready for marketing). The results after pre-processing totaled 188 rows of data with six attributes, encompassing 150 lots accepted for marketing, 6 rejected, and 32 intermediate lots. The classifiers used were J48, Random Forest, Classification Via Regression, Naive Bayes, Multilayer Perceptron, and IBk. The Resample filter was used for adjustment of the data. The k-fold technique was used for training, with ten folds. The metrics of Accuracy, Precision, Recall, F-measure, and ROC Area were used to verify the accuracy of the algorithms. The results obtained were used to determine the best machine-learning algorithm. IBk and J48 presented the highest accuracy of data; the IBk technique presented the best results. The Resample filter was essential for solving the data imbalance problem. Sorghum seed lots can be classified with great accuracy and precision through artificial intelligence and machine learning technique.\",\"PeriodicalId\":21558,\"journal\":{\"name\":\"Revista Caatinga\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Revista Caatinga\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1590/1983-21252023v36n224rc\",\"RegionNum\":4,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AGRONOMY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista Caatinga","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1590/1983-21252023v36n224rc","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AGRONOMY","Score":null,"Total":0}
ABSTRACT The ranking of seed lots is a fundamental process for all companies in the seed industry. This work aims to demonstrate data mining methods for ranking sorghum seed lots during the seed processing through analysis of quality control data. Germination and cold tests were performed to verify the physiological quality of the lots. Seed samples from each lot were evaluated in two moments: post-cleaning and finished product (ready for marketing). The results after pre-processing totaled 188 rows of data with six attributes, encompassing 150 lots accepted for marketing, 6 rejected, and 32 intermediate lots. The classifiers used were J48, Random Forest, Classification Via Regression, Naive Bayes, Multilayer Perceptron, and IBk. The Resample filter was used for adjustment of the data. The k-fold technique was used for training, with ten folds. The metrics of Accuracy, Precision, Recall, F-measure, and ROC Area were used to verify the accuracy of the algorithms. The results obtained were used to determine the best machine-learning algorithm. IBk and J48 presented the highest accuracy of data; the IBk technique presented the best results. The Resample filter was essential for solving the data imbalance problem. Sorghum seed lots can be classified with great accuracy and precision through artificial intelligence and machine learning technique.
期刊介绍:
A Revista Caatinga é uma publicação científica que apresenta periodicidade trimestral, publicada pela Pró-Reitoria de Pesquisa e Pós-Graduação da Universidade Federal Rural do Semi-Árido – UFERSA, desde 1976.
Objetiva proporcionar à comunidade científica, publicações de alto nível nas áreas de Ciências Agrárias e Recursos Naturais, disponibilizando, integral e gratuitamente, resultados relevantes das pesquisas publicadas.