Near-infrared spectroscopy is an established tool for the estimation of different nutrients in diverse sample matrices. Of these, near-infrared transmittance (NIT) has very wide usage in whole grain analysis for oil, protein, and other macronutrients. NIT spectra obtained from samples are regressed with actual laboratory values for developing prediction models. However, the spectra obtained are sloppy, slightly noisy, and show baseline drifts. To increase the resolution and signal-to-noise ratio, derivatives are common preprocessing tools, typically implemented along with smoothing.
A systematic study on different derivatives (1, 2, and 3) and gaps (2–90) was performed. The germplasm set with high variability for protein content (8.63%–19.56%) was used, and regression models were developed using the modified partial least squares method. Among all, the second-order derivative gave best-fit models; hence, the results of the gap with second-order derivatives are studied in detail. The plot of R2 for external validation set with different gaps at second-order gave three peaks, namely, at 47, 60, (69, 70, 71) where the highest R2 (0.985) was obtained for the third peak having three consecutive gap segments.
Hence, math treatment (2, 70, 2, 1) was finalized considering stability where a high residual prediction deviation of 7.149 and a low bias of (0.021) was obtained. A paired t test and reliability test between predicted and laboratory values confirmed nonsignificant differences between them. Thus, the developed model is robust and precise and can be utilized in high throughput screening of wheat germplasm.
Better performance at second derivative and higher gap can be used for developing robust models with low bias by avoiding multi-collinearity, which is usually a limitation in multi-variate analysis.