Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometer

IF 1 Q3 MULTIDISCIPLINARY SCIENCES
Colleen Partida , Jose Lucas Safanelli , Sadia Mannan Mitu , Mohammad Omar Faruk Murad , Yufeng Ge , Richard Ferguson , Keith Shepherd , Jonathan Sanderman
{"title":"Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometer","authors":"Colleen Partida ,&nbsp;Jose Lucas Safanelli ,&nbsp;Sadia Mannan Mitu ,&nbsp;Mohammad Omar Faruk Murad ,&nbsp;Yufeng Ge ,&nbsp;Richard Ferguson ,&nbsp;Keith Shepherd ,&nbsp;Jonathan Sanderman","doi":"10.1016/j.dib.2024.111229","DOIUrl":null,"url":null,"abstract":"<div><div>This near-infrared spectral dataset consists of 2,106 diverse mineral soil samples scanned, on average, on six different units of the same low-cost commercially available handheld spectrophotometer. Most soil samples were selected from the USDA NRCS National Soil Survey Center-Kellogg Soil Survey Laboratory (NSSC-KSSL) soil archives to represent the diversity of mineral soils (0–30 cm) found in the United States, while 90 samples were selected from Ghana, Kenya, and Nigeria to represent available African soils in the same archive. All scanning was performed on dried and sieved (&lt;2 mm) soil samples. Machine learning predictive models were developed for soil organic carbon (SOC), pH, bulk density (BD), carbonate (CaCO3), exchangeable potassium (Ex. K), sand, silt, and clay content from their spectra in the R programming language using most of this dataset (1,976 US soils) and are included in this data release. Two model types, Cubist and partial least squares regression (PLSR) were developed using two strategies: (1) using an average of the spectral scans across devices for each sample and, (2) using the replicate spectral scans across devices for each sample. We present the internal performance of these models here. The dry spectra and Cubist models for these soil properties are available for download from <span><span>10.5281/zenodo.7586621</span><svg><path></path></svg></span>. An example of detailed code used to produce these models is hosted at the Open Soil Spectral Library, a free service of the Soil Spectroscopy for the Global Good Network (<span><span>soilspectroscopy.org</span><svg><path></path></svg></span>), enabling broad use of these data for multiple soil monitoring applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111229"},"PeriodicalIF":1.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731769/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340924011910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

This near-infrared spectral dataset consists of 2,106 diverse mineral soil samples scanned, on average, on six different units of the same low-cost commercially available handheld spectrophotometer. Most soil samples were selected from the USDA NRCS National Soil Survey Center-Kellogg Soil Survey Laboratory (NSSC-KSSL) soil archives to represent the diversity of mineral soils (0–30 cm) found in the United States, while 90 samples were selected from Ghana, Kenya, and Nigeria to represent available African soils in the same archive. All scanning was performed on dried and sieved (<2 mm) soil samples. Machine learning predictive models were developed for soil organic carbon (SOC), pH, bulk density (BD), carbonate (CaCO3), exchangeable potassium (Ex. K), sand, silt, and clay content from their spectra in the R programming language using most of this dataset (1,976 US soils) and are included in this data release. Two model types, Cubist and partial least squares regression (PLSR) were developed using two strategies: (1) using an average of the spectral scans across devices for each sample and, (2) using the replicate spectral scans across devices for each sample. We present the internal performance of these models here. The dry spectra and Cubist models for these soil properties are available for download from 10.5281/zenodo.7586621. An example of detailed code used to produce these models is hosted at the Open Soil Spectral Library, a free service of the Soil Spectroscopy for the Global Good Network (soilspectroscopy.org), enabling broad use of these data for multiple soil monitoring applications.
使用手持近红外分光光度计建立近红外(NIR)土壤光谱数据集和预测机器学习模型。
这个近红外光谱数据集由2106个不同的矿物土壤样本组成,平均在相同的低成本商用手持式分光光度计的六个不同单元上扫描。大多数土壤样本来自美国农业部国家土壤调查中心-凯洛格土壤调查实验室(NSSC-KSSL)土壤档案,以代表在美国发现的矿物土壤(0-30厘米)的多样性,而来自加纳,肯尼亚和尼日利亚的90个样本代表同一档案中的可用非洲土壤。所有扫描均在干燥和筛分(
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data in Brief
Data in Brief MULTIDISCIPLINARY SCIENCES-
CiteScore
3.10
自引率
0.00%
发文量
996
审稿时长
70 days
期刊介绍: Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信