Fitting AIDs Data
This applet explores AIDs data from the early years of the epidemic. It extends the LeastSquares applet of Section 1.1, providing tools for entering a data list and seeking polynomial functions that model the data in the best least squares sense. Controls for entering data are similar to those in the earlier Least Squares applet, and have, in addition, a button for obtaining the best exponential fit to the data.
Click here for more instructions below.
Note 1: The data provided are from the Center for Disease Control's tracking of the AIDs epidemic in the 1980's. Use the applet to explore various curves that fit the data. Note that there is no unique solution; in particular it is possible to obtain an exact polynomial fit if the polynomial has sufficiently high degree. But it is also easy to see that high-degree polynomials are not likely to express any underlying intrinsic behavior of the data set. High-degree polynomials can "wiggle" a great deal, and they can fit the data by virtue of their pliability, not because they model any fundamental behavior of the data.
Note 2: One research paper has claimed on epidemiological grounds that an epidemic like that of AIDs follows a cubic polynomial law, not an exponential growth law. The argument was based on constraints on the way AIDs is transmitted. From your investigations with this applet, do you think there is basis for this research claim? How good are the quadratic, cubic, and fourth-degree polynomial approximations? And how do they compare with the best exponential fit?
Note 3: If you are observant you will notice that the 9th degree polynomial that is displayed has a LS error of about 0.1. Why is it not zero? Notice in particular that the constant term in the polynomial is 294.998 whereas the value should be exactly 295 (an error of about 7 parts in a million). It should be possible to find a 9th degree polynomial that passes exactly through the 10 given points. The answer has to do with the numerical facts of life(1) the computer carries out its computations to about 16 significant digits of accuracy, (2) the process of finding the coefficients of the best LS polynomial involves solving a system of 10 equations in 10 unknowns, requiring thousands of multiplications and additions, (3) the particular AIDs data involves numbers of very different sizes, the largest being more than 500 times the smallest, (4) the coefficients of the LS polynomial are of very different sizes, the largest being 180,000 times larger than the smallest. It seems quite remarkable that the resulting LS polynomial does such a good job of approximating the the given AIDs data points! A least squares error of 0.1 means that the value of (p(i) - yi)^2 for a typical data point (i, yi) is about 0.01, and thus | p(i) - yi | is about sqrt(0.01) = 0.1. Not a bad error considering the large values involved. You might enjoy zooming in on the data point (0, 295) to get some feeling about how close the polynomial comes to passing exactly through that point. If you are not an observant person you might have overlooked the fact that the LS error was not exactly zero. After all, what is 0.099 among friends, when the size of the friends is in the hundreds of thousands?
To think about: The last datum in the list was for the year
1991. Using the field for calculating values of the LS polynomial, extrapolate
to the year 1995. What value of the number of AIDs cases in 1995 would
you predict? Note that the result is strongly dependent on the degree
you choose for the approximating polynomial. Would you trust the "exact"
polynomial of degree 9? Why does a cubic or quartic polynomial seem
more reasonable? What value would the best exponential approximation
predict? Do you believe it? Using the cubic approximating polynomial,
what value is predicted for the year 2002. Can you give reasons why
the cubic model gives a good prediction for 1995 but not for 2002?