This site is under construction. Please check back every few weeks for updates

COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

Introduction        Types of Mistakes        Suggestions        Resources        Table of Contents         About



Extrapolation Beyond the Range of the Data

Example: The following graph shows a curve fitted to men's best times in the 100 m dash for 2001 through 2009.

Men's best times in the 100 m dash, 2001 - 2009

If the trend for years before 2006 had been used to predict the times for 2008 and 2009, the estimate would be noticeably over the actual times. If the trend for 2008 to 2009 is used to predict the time for 2013, we would have some very amazing times.

In some instances, the purpose of a study is indeed to predict what will happen next month or next year, based on recent data. The researcher has to do the best they can. But predicting farther in the future leads to more uncertainty.

A similar problem occurs, but is not so easy to detect, when considering more variables. 

Example: Consider a study whose purpose is to try to predict variable z in terms of variables x and y. Suppose the graph below shows the plot of y versus x values for the data in blue,  but it is desired to predict z for a case where the x and y values are shown by the yellow point. This prediction could be considered extrapolation beyond the range of the data, since the blue data points all lie within a roughly elliptical region, but the yellow point lies noticeably outside that region. (Note that in this example, we can detect that the yellow point lies outside the range of the data simply by graphing the data. In situations with more variables, the Mahalanobis distance is sometimes appropriate to detect possible extrapolation beyond the range of the data.)

plot of y vs x values for data and for point showing extrapolation