The value of a number

Posted: 12 December 2009 | Richard Dempster, Director, Product and Technological Development, AIB International | No comments yet

Often, we get in the habit of accepting numbers from computerised displays without regard to accuracy or precision, and when we do evaluate a number, we often look at how precise it is. We forget that we can be very precisely wrong. We don’t really pay close attention to numbers from our bank’s ATM, a gas pump or a near infrared instrument unless we think they are substantially wrong. We certainly pay closer attention to our bank account but tend to accept numbers from other devices that may have greater monetary importance and higher error rates. In this article, I will give a brief overview of the main sources of error specifically associated with near infrared (NIR) instruments and what effect these errors have on the number displayed. The overall goal is to interpret the numbers correctly. In this article, I use NIR as a general term to include both reflective and transmission instruments.

Often, we get in the habit of accepting numbers from computerised displays without regard to accuracy or precision, and when we do evaluate a number, we often look at how precise it is. We forget that we can be very precisely wrong. We don't really pay close attention to numbers from our bank's ATM, a gas pump or a near infrared instrument unless we think they are substantially wrong. We certainly pay closer attention to our bank account but tend to accept numbers from other devices that may have greater monetary importance and higher error rates. In this article, I will give a brief overview of the main sources of error specifically associated with near infrared (NIR) instruments and what effect these errors have on the number displayed. The overall goal is to interpret the numbers correctly. In this article, I use NIR as a general term to include both reflective and transmission instruments.

NIR instruments typically have three main sources of error: instrument error, reference error and the math that ties those two together – the regression model (calibration). All of these errors enter into the resulting number displayed. Most users will fault the instrument for an errant number, but today’s NIR instruments probably provide the least amount of error to the number.

One main source of NIR error is in the laboratory values. These values are used to create a regression model (calibration). Most laboratories return the average value of the sample, actually analysing several sub-samples of the product. You rarely see the standard deviation of those sub-samples, nor do you see how many sub-samples were used to calculate the average. I have the fortune to be director of our laboratory and generate the calibrations that I research; therefore, I have the ability to keep close track of the laboratory error rate. I require a minimum of three sub-samples for all samples destined for specialised calibrations. I monitor the within standard deviation of these sub-samples closely and re-examine any outliers that may occur. In addition, I run true replicates without the knowledge of the laboratory personnel.

To digress a bit, laboratory error is the difference between the values you get from the laboratory to the actual values. When discussing the actual value, we’re looking at accuracy and this is a very elusive value. Usually, we use probability to determine the most likely value but this is a subject for another article. What we’re really looking for is the difference between the actual value and what the laboratory reported. The best method to determine this error is to subscribe to a check sample service where the same (as close as possible) product is sent to various laboratories in hopes that the mean of the laboratories is close to the actual value.

You can keep track of laboratory errors in three ways. First, monitor check samples every few months as most check sample services only send one per month. Send in true replicates to detect between sample variations, occasionally send replicates that are days apart to test for day to day variability or to see if climatic changes influence the results, but care of the sample is a must as biological material change with time. These are always sent in blind, the code only known to you. Finally, if possible, monitor within variation using the individual sub-sample results that make up the average. A few years back, we installed a database that requires all the sub-sample results to be entered. This made keeping track of the within variation quite easy, especially since I developed a simple program to automatically compute the standard deviation and CV, and emails the report directly to me.

Jumping ahead to the NIR instrumental errors; depending on what is being analysed, there may be more sources of error than in the laboratory but much lesser value and weight. Instrument noise, usually generated by heat, can vary based upon the location of the instrument. There is not much you can do with this error other than understand it. Noting that it may affect the fringes of the sensor more due to the lower detection sensitivity in the extreme ranges, thereby lowering the signal to noise ratio. If it’s too great, increasing the number of scans per sample may help, though it may be better to move the NIR instrument to a controlled environment. When developing calibrations, try not to include the ends of the spectra due to detector response. You can save a lot of calibration time if you can find the response curve of the detector you’re working with. Usually a response curve of a standard reference material will give adequate information for determining spectral regions to exclude in a calibration due to noise or sensor sensitivity. One error that should be examined prior to purchasing a new instrument is repeatability without replacement. That is, can you get the same number by just re-scanning the sample without moving it? If the standard deviation is too large for your application, look for another instrument. The acceptable standard deviation is usually determined by the desired precision of the product to be predicted. The problem could be in the regression model, so I only compare raw spectra from these scans. Drift is an error that I’m starting to see more of. As the electronic components age, their performance drifts away from the original specification. Today, I find a lot of older instruments still being used without testing of the prediction error rate. The number came from the instrument so it must be right!

Errors that are also associated with instruments but are not instrumental errors include the following: particle size, packing pressure, temperature (especially in liquids), and others based on the type and style of NIR instrument. These are described as presentation errors and are independent of the NIR instrument but can be specific to the instrument due to the manner in which a sample must be presented. Even though you have a standard operating procedure, just changing who prepares and scans the sample can make a difference. The point is; we tend to become relaxed when we’re doing the same procedure every day. We also tend to have the newer personnel do the routine ‘simple’ duties, and then wonder why we get inconsistent results. Either we’re back on track or we need further training, but we must monitor the results in order to make the correct decision.

Including all the errors associated with the laboratory procedure and the errors associated with the instrument, you can construct a probability density plot that will allow you to visually see the possible numbers you can get from any one sample. Figure 1 is a three dimensional plot that illustrates the combined effect of laboratory variation and instrument variations. In this illustration, the numbers used are just examples but are close to variations I have seen. Figure 2 is the same plot, just rotated to give a perspective of the different width of the two error variations. When a value is given from a NIR scan, it can come from any part of the area that is not solid blue. We assume the true value is in the centre, and we hope that our value is in the centre, too. In all probability, the number’s location is some distance from the centre and the true value is not the centre either. All samples I work with come from a normal distribution population, therefore these plots are valid for my samples. In cases where the distribution may be exponential, binary or some other distribution that may occur in the chemical industry or other industries, the plots should be constructed using the appropriate distribution formula. Of course, the regression model comes into play here as well, but that generally effects the position of the probability distribution in the x, y plane and not the laboratory or NIR instrumental standard deviation.

Finally, we come to the regression model that ties the reference laboratory values and the instrument spectra together to yield the number you hope is correct. The goal is to find the perfect relationship between the NIR spectrum and laboratory reference values. In reality, we’re trying to find a reasonable relationship given the instrument error space and the laboratory error space. This is an area where there is still a lot of discussion, research and development. There are many regression models available and many choices for pre-treatment of the spectra and laboratory values. A good calibration requires the following: skill, the proper tools, knowledge of the population space, knowledge of the spectral space, knowledge of chemistry (specifically food chemistry in my case), patience and time. It should be clear that development of calibrations is expensive and there are no short cuts. Selection of the incorrect regression model will obviously yield incorrect results regardless of how good the laboratory data or the instrument is.

Obtaining good laboratory reference values can cost tens of thousands of dollars. This may be the main reason for not maintaining calibrations. There are some methods to reduce the cost of the reference samples. If you fully characterise the population under study, one may be able to select a unique sample set that fully encompasses the variation expressed by the population. In certain populations, obtaining this set may be impossible as occurrences of samples in the tails of a normal population may not happen in a timely manner. If one fails to account for the variation or range of the population, then large errors can occur due directly to the limits of the calibration. Using a calibration outside of its range accounts for the biggest source of error, especially in the food industry. We see so many samples within the first standard deviation that we just accept the one that is outside the third standard deviation range. A calibration should never be used outside of the sampling range.

Another source of error is certain outliers. I have found a number of samples that are predicted wrong and after having them reanalysed and rescanned, they still fall outside the known prediction error rate but are perfectly good samples. Researching outliers is a growing trend and recently there have been some suggestions that the linear calibrations may be inadequate for our biological world. The difficulties of moving to a cubic, quadratic, or polynomial calibration is great and the closest we have today to a non-linear calibration is neural network calibrations, but these require vary large sample sets for training purposes and can be costly.

One must mention sampling procedures any time you are obtaining small quantities from a large population. Often only a few grams of product are used to characterise metric tons of product. Even though the results from an NIR scan are within acceptable limits, an incorrect sample will incorrectly describe the product. One must know the variation of the population in order to properly sample it. In addition, when sampling for calibration development without knowing the full range of the population, you may unknowingly increase the error rate of the two ends for the regression model. You may not have the proper number of samples required to fully express the regression model. This is not necessarily an error of the regression model but an error in the procedure to obtain a model. I recently had to redo a calibration because of lower bake absorption values I received from a new wheat crop year. In many cases, it may take many years to know or see the full range of possible values. Many users of NIR cannot adapt quickly to changes in the population, and many of today’s calibrations are based off of too few of samples, especially when dealing with year-to-year variability and major long-term weather cycles.

In summary, there are many books and web pages available to help reduce the various errors discussed. Dealing with noise alone may require a college course in noise theory. Therefore, in most cases, you may be limited as to the control you have over these errors, but realising how the value you received was obtained is the first step to getting closer to that elusive true number. I may have painted a very poor picture of NIR, but as I visit various companies that use NIR, I find that there are far too many instruments that have not been tested or properly maintained since the day they were purchased. In reality, an NIR instrument that is well maintained, frequently tested, and ran by a well trained staff provides a rapid and very economical tool to obtain valuable data.

Issue

Issue 4 2009

Related people

Richard Dempster

Cookie	Description
cookielawinfo-checkbox-advertising-targeting	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Description
cf_ob_info	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	This cookie is set by Youtube and is used to track the views of embedded videos.

Cookie	Description
bcookie	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	This cookie is set by LinkedIn and used for routing.
lissc	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Description
advanced_ads_browser_width	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Recommended

The value of a number

Issue

Related topics

Related people

Leave a Reply Cancel reply

Recommended

The value of a number

Issue

Related topics

Related people

The science of perfect baking: how NIR technology elevates every batch

Leave a Reply Cancel reply