The mathematical basis of improved calibration through selection of informative variables for partial least-squares calibration has been identified. A theoretical investigation of calibration slopes indicates that including uninformative wavelengths negatively affect calibrations by producing both large relative bias toward zero and small additive bias away from the origin. These theoretical results are found regardless of the noise distribution in the data. Studies are performed to confirm this result using a previously used selection method compared to a new method, which is designed to perform more appropriately when dealing with data having large outlying points by including estimates of spectral residuals. Three different data sets are tested with varying noise distributions. In the first data set, Gaussian and log-normal noise was added to simulated data which included a single peak. Second, near-infrared spectra of glucose in cell culture media taken with an FT-IR spectrometer were analyzed. Finally, dispersive Raman Stokes spectra of glucose dissolved in water were assessed. In every case considered here, improved prediction is produced through selection, but data with different noise characteristics showed varying degrees of improvement depending on the selection method used. The practical results showed that, indeed, including residuals into ranking criteria improves selection for data with noise distributions resulting in large outliers. It was concluded that careful design of a selection algorithm should include consideration of spectral noise distributions in the input data to increase the likelihood of successful and appropriate selection.
ASJC Scopus subject areas
- Analytical Chemistry