After all, facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, "Lies - damned lies - and statistics,"
still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of.
Leonard Courtney

Homework for Math 10: Introductory Statistics

Remember homework is not for handing in, but only to prepare you for quizzes (which will generally be problems nearly identical to the homework) and exams.

Answers to the homework sets from the textbook (those labeled with letters) may be found in the book's appendix. In consideration of others who wish to use the textbook without the answers to the review exercises available, I will not post those. Answers to supplemental exercises are available.

Jump to extra review problems for: Midterm 1  ◊  Midterm 2  ◊  Final Exam

Part IV: Probability

Chapter 13: What are the chances?

These exercises will be covered on Quiz 1, Apr 2.

1. Set A, p 225: #1
2. Set B, p 227: #1-4
3. Set C, p 229: #1-7
4. Set D, p 232: #1-8
6. Review exercises, p 234: #1-12

1. Roll 4 six-sided dice, colored red, blue, green, and yellow, and observe the numbers that come up.
1. What is the probability the red and blue dice show 1?
2. What is the probability all dice show even numbers?
3. What is the probability at least one die shows a 6?
4. What is the probability the red die shows 5 and the green die shows 3?
2. Draw two cards, one at a time and without replacement, from a well-shuffled deck of 52.
1. What is the probability both cards are kings?
2. What is the probability neither card is a king?
3. What is the probability the second card is red, given that the first card was a diamond?
4. What is the probability the second card is a heart, given that the first card was a spade?
3. Roll one die and observe the number showing. Are the events "is greater than 4" and "is an even number" dependent or independent?
4. We will go through this in class, but try it yourself as well. You have a bag containing 5 blue marbles, 6 black chessmen, 5 white chessmen, 12 black checkers, and 5 red checkers.
1. Draw one object at random from the bag and observe its color and type.
1. What is the probability the object is black?
2. Are the events "is black" and "is a chessman" independent? How about "is black" and "is a checker"? Finally, "is a chessman" and "is a checker"?
3. What is the probability that the object is a checker given that it is (a) red, or (b) blue?
2. Draw two objects from the bag, one at a time, with replacement.
1. What is the probability both objects are blue?
2. What is the probability neither object is blue?
3. What is the probability both objects are blue, given that at least one was blue?
3. Draw two objects from the bag, one at a time, without replacement.
1. What is the probability both objects are red?
2. What is the probability neither object is red?
3. What do you think the probability should be that exactly one of the objects is red? We will learn this in Chapter 14.

These exercises will be covered on Quiz 2, Apr 9.

2. Set B, p 242: #1-6
3. Set C, p 246: #1-5 (note: the book says "true" for 3f but the answer is false)
4. Set D, p 250: #1-7
6. Review exercises, p 252: #1-14

1. A coin is flipped and two dice are rolled. What is the probability of getting a head or a pair of 1s?
2. The probability of event A is 30% and the probability of event B is 40%. If the probability of event "A and B" is 15%, what is the probability that neither event A nor event B occurs?
3. The probability of event A is 45%, the probability of event B is 20%, and the probability of event "A or B" is 56%. Are event A and event B independent?
4. The probability of event A is 20%, and the probability of event B is 50%. What are the minimum and maximum possible values for the probability of the events "A and B" and "A or B"?
5. Repeat exercise 4 with probabilities 40% and 80% for events A and B, respectively.

Chapter 15: The binomial formula

These exercises will be covered on Quiz 2, Apr 9.

1. Set A, p 258: 1-6 (feel free to use the binomial formula)
3. Review exercises, p 261: #1-11

1. 5 dice are rolled.
1. What is the probability of getting exactly two 3s?
2. What is the probability of getting exactly three 5s?
3. What is the probability of getting exactly two 3s and exactly three 5s?
4. What is the probability of getting exactly two 3s or exactly three 5s?
5. Are the events "exactly two 3s are rolled" and "exactly three 5s are rolled" independent?
2. Again, roll 5 dice. What is the probability of getting exactly two 3s and exactly one 4?
3. Calculate 1037 choose 1035 by hand.

Part II: Descriptive Statistics

Chapter 3: The histogram

These exercises will be covered on Quiz 3, Apr 16.

1. Set A, p 33: #4-7
4. Set D, p 44: #1, 2
5. Set E, p 46: #1, 2
6. Set F, p 48: #1
8. Review exercises, p 50: #1-3, 6-12

1. This exercise extends Set D #2 and is intended to help you keep your Part IV skills alive. We make the highly improbable assumption that there were an equal number of high school and college educated women in the group from which this data was drawn.
1. Suppose you choose a woman from the study at random. What is the probability she has three children?
2. Are the events "has one or two children" and "has a college degree" independent?
2. Download this data set of lengths (in inches) of the forearms of 140 adult males: comma-delimited (csv) or tab-delimited (txt). (From Peter Dunn, Datasets for statistical analysis.)
1. Find the minimum and maximum values of the data set.
2. Create a frequency histogram with bins of width .5in (round the minimum down and the maximum up to multiples of .5in for your endpoints). Remember your bin values are the top end of each range.
3. Create a frequency histogram with bins of width .25in (round the minimum down and the maximum up to multiples of .25in for your endpoints).
4. The histogram in (c) is jagged in a particular way. Which bins occur with more frequency than their immediate neighbors? Do you think this is representative of adult male forearm lengths, or an artifact of the way the measurements were taken?

Chapter 4: The average and the standard deviation

These exercises will be covered on Quiz 3, Apr 16.

2. Set A, p 60: #1-9
3. Set B, p 65: #1, 2, 5, 6
4. Set C, p 67: #1-3, 6
5. Set D, p 70: #1-10
6. Set E, p 72: #1-12 (you might want to use Excel for #5-7; remember to use stdevp)
8. Review exercises, p 74: #1-12

1. For the forearm-length data set above, calculate the mean, median, mode, (population) standard deviation, first and third quartiles, and range. How well does the standard deviation represent the actual spread of the data? Given the histogram(s) you made previously and the values of mean versus median, would you expect it to represent the spread well?
2. Suppose I tell you the five-value summary for a set of plant-height measurements (in centimeters) is 2, 5, 6.5, 8, 13. What is the probability that a plant chosen at random from the sample has height at most 13 cm? At most 8 cm? Between 5 and 8 cm? At most 5 cm?
3. Suppose I now tell you that the mean of the plant heights is 6.5 cm and the standard deviation is 1.5 cm. What assumption do you need to make to draw conclusions about the probability of height when you choose a plant at random from the sample? Given that assumptions, What is the probability that a plant chosen at random from the sample has height at most 9.5 cm? At most 8 cm? Between 5 and 8 cm? At most 5 cm?
4. In Exercise Set E (Section 6), #5-7 ask you to calculate the mean and standard deviation for pairs of related data sets. Interpret this graphically; for each of the problems, fill in the blank: The manipulation that produced data set (ii) from data set (i)       a/b/c/d       the histogram (multiple answers possible). In general, what will the effect on mean and standard deviation be when the histogram is       a/b/c/d      ? (For b and c, assume the left end of the histogram is fixed and the right end moves toward or away from it; for d assume the reflection is about the y-axis.)
1. translated (shifted)
2. dilated (stretched out)
3. compressed
4. horizontally reflected

Chapter 5: The normal approximation for data

These exercises will be covered on Quiz 3, Apr 16.

1. Set A, p 82: #1, 2
2. Set B, p 84: #1-5
3. Set C, p 88: #1-3
4. Set D, p 89: #1-3
5. Set E, p 92: #1-3
6. Set K, p 93: #1
7. Review exercises, p 93: #1-12

1. Convert the maximum and minimum of the forearm-length data set into standard units.
2. A certain measurement has been made that can range from 0 to 20 cm. A data set has been created from a set of measurements by converting each measurement into a fraction of the maximum, so 10 is converted to 0.5. The mean and standard deviation of the converted data set are 0.6 and 0.2.
1. What are the mean and standard deviation of the original measurements?
2. Translate outcomes 0.4 and 0.7 from the converted data set into both original measurements and standard units.

Chapter 6: Measurement error

These exercises will not be quizzed on their own.

1. Review exercises, p 104: #1-4
2. Special review, p 105: #2-4, 6-12

Chapter 7: Plotting points and lines

Review on your own if necessary.

1. Chapter 15 Special Review (section 4), p 263: #5, 7-9, 18-20.
2. Find the correct probability for Example 3 in 15.2 (page 260).
3. If you are comfortable with this one you can feel confident of your skills in calculating probabilities: Six dice are rolled and the numbers showing are observed. Let event A be "exactly two 1s and exactly one 6 appear", and let event B be "at least one 5 appears". Find the probability of A and of B given A.
4. How many subsets does a set of 2 elements have? 3 elements? 4? n? [Note: the empty set and the entire original set both count as subsets.]
5. Suppose you know the value of 1000 choose 40; call it x. Find 1000 choose 960 and 1000 choose 41 in terms of x.

Part III: Correlation and Regression

Chapter 8: Correlation

These exercises will be covered on Quiz 4, Apr 30.

1. Set A, p 122: #1-6
2. Set B, p 128: #1-9
3. Set C, p 131: #1-4
4. Set D, p 134: #1-4 (for 1, feel free to use Excel)
5. Review exercises, p 134: #1-3, 6-11 (for 9, feel free to use Excel)

1. Suppose you have two normally-distributed data sets of equal size and are going to match their entries at random. What proportion of the matched pairs will have entries
1. each within one standard deviation of the mean?
2. each within two standard deviations of the mean?
3. each more than two standard deviations from the mean?
You may use the rounded values of 68% within one SD of the mean and 95% within two SDs of the mean.
2. Download this spreadsheet of data about wolves: csv, txt. It contains age, body length, and canine tooth length for a collection of 19 wolves in Alaska (data pulled from Johnson and Bhattacharyya, Statistics: Principles and Methods).
1. Compute the correlation coefficient for all three possible pairs of data. Which is the largest in magnitude?
2. Make a scatter plot of body length versus canine tooth length and add a linear trendline.
3. None of the correlations are very strong. Do you think there ought to be a relationship? If so, what might be hiding it? That is, in what way(s) might you subdivide the wolf population to look for a larger correlation coefficient in the smaller group?

These exercises will be covered on Quiz 4, Apr 30.

1. Set A, p 143: #1-7, 10
2. Set B, p 145: #3, 4
3. Set C, p 148: #1-4
4. Set D, p 149: #1, 2
5. Set E, p 152: #1-4
6. Review exercises, p 153: #1-12

Chapter 10: Regression

These exercises will be covered on Quiz 4, Apr 30.

1. Set A, p 161: #1-5
2. Set B, p 163: #1-4
3. Set C, p 167: #1-5
4. Set D, p 174: #1-3
5. Set E, p 175: #1-3
6. Review exercises, p 176: #1-10

1. Download this spreadsheet of the boiling point of water (in degrees Fahrenheit) at different elevations in the Alps (as measured by barometric pressure in inches of mercury): csv, txt. This is from Dunn's Data Sets for Statistical Analysis.
1. Compute the correlation coefficient.
2. Suppose I measured the boiling point of water as 202.5 degrees. What would you predict the atmospheric pressure was?
3. Suppose I measured the atmospheric pressure as 21 inches of mercury. What would you expect the boiling point of water to be?

Chapter 11: The r.m.s. error for regression

These exercises will be covered on Quiz 5, May 7.

1. Set A, p 184: #1-8
2. Set B, p 187: #1-3
3. Set C, p 189: #1-3
4. Set D, p 193: #1-7
5. Set E, p 197: #1-3
6. Review exercises, p 198: #1-7, 8a, 9-12

Chapter 12: The regression line

These exercises will be covered on Quiz 5, May 7.

1. Set A, p 207: #1-4
2. Set B, p 210: #1-5
4. Review exercises, p 213: #1-11

Part V: Chance Variability

Chapter 16: The law of averages

These exercises will be covered on Quiz 5, May 7.

1. Set A, p 277: #1-9
3. Set B, p 280: #1-7
4. Set C, p 284: #1-3
5. Review exercises, p 285: #1-10

Chapter 17: The expected values and standard error

Exercises for sections 1 and 2 will be covered on Quiz 5, May 7.

1. Set A, p 290: #1-6
2. Set B, p 293: #1-7
3. Set C, p 296: #1-8
4. Set D, p 299: #1-4
5. Set E, p 303: #1-9
6. Review exercises, p 304: #1-14

Chapter 18: The normal approximation for probability histograms

Except for the concept of a probability histogram, which we saw all the way back in part II, and the basic idea of the Central Limit Theorem (that the distribution of sums of draws approaches the normal curve as the number of draws increases, and therefore it is valid for large number of draws to use the normal curve to estimate percentages), we are skipping this chapter. However, you may find the following exercises useful for midterm preparation:
Section 2, Set A, p 312: #1-3, 5, 6
Section 4, Set B, p 318: #1, 3, 5
Section 5, Set C, p 324: #2, 5
Section 7, Review exercises, p 327: #1, 2, 7, 8, 10-15

The book actually does a good job covering the topics I would like you to be comfortable with. Here is one extra multi-part problem for part III.

After giving a large group of people two tests, time to win a video game and time to assemble a puzzle, the summary data is as follows.
video game mean: 4 minutes      video game SD: 1.5 minutes
puzzle mean: 3 minutes      puzzle SD: 1 minute
r = 0.7

1. What percentage of the people at the 80th percentile for the video game do you expect to also be at or above the 80th percentile for the puzzle?
2. What is the predicted score on the puzzle test for someone at the 80th percentile on the video game test?
3. If the actual score pair for someone is (5.3, 4), where the video game time is first and the puzzle time second, what is the error for that prediction?
4. Of course in timed tests we usually think of "better" scores as shorter times. For this problem, then, define 80th percentile to be the score less than or equal to 80% of the times. Does that change affect your answer to part 1? What does it make the answer to part 2? What would the time for the puzzle test be in part 3 to make the error the same as in part 3 but with the video game score now the 80th percentile defined in this way?
5. What assumption are you making to compute these estimates? How valid do you think it is - how well do you think your estimates should correspond to reality, especially in part 4 as compared to parts 1-3? [note: this question is a lot more vague than any that would appear on the exam and exists really for me to comment on the problem.]

Part VI: Sampling

Chapter 20: Chance errors in sampling

These exercises will be covered indirectly on Quiz 6, May 21.

2. Set A, p 361: #1-8
3. Set B, p 366: #1-3
4. Set C, p 370: #1-5
6. Review exercises, p 371: #1-12 (1 ton = 2000 lbs)

Chapter 21: The accuracy of percentages

These exercises will be covered indirectly on Quiz 6, May 21.

1. Set A, p 379: #1-8
2. Set B, p 383: #1-4
3. Set C, p 386: #1-8
4. Set D, p 388: #1, 2
5. Skip
6. Review exercises, p 391: #2-5, 7-9, 12-15

Download this spreadsheet of the weights of Cherry Ripe candy bars (an Australian brand): csv, txt. This is from Dunn's Data Sets for Statistical Analysis.

1. Compute 68% and 95% confidence intervals for the weight of the bars (note your intervals will be in raw numbers, not percentages, but the method is the same, using the appropriate SE).
2. Unfortunately since the sample is small the intervals are wide, but Dunn says every time he has done this experiment with students the weights have been over the advertised weight of 18 grams. If that is a real phenomenon, why do you think it occurs?

Part VIII: Tests of Significance

Chapter 26: Tests of significance

These exercises will be covered on Quiz 6, May 21.

1. Set A, p 476: #1-5
2. Set B, p 478: #1-5
3. Set C, p 481: #1-8
4. Set D, p 482: #1-5
5. Set E, p 486: #1-10
6. Skip
7. Review exercises, p 495: #1-5, 12 (data for #12 as csv or txt)

1. Out of 500 people surveyed, 43 were regular consumers of Hi-Brow Bran Flakes. Using the z-test, test the null hypotheses that (a) 5% of people are regular consumers of Hi-Brow, and (b) 10% of people are regular consumers of Hi-Brow. Can you conclude anything about the relative probability of these hypotheses?
2. Is there any sample proportion such that the z-test would support both null hypotheses (a) and (b) of exercise 1 simultaneously?
3. Is there any sample proportion such that the z-test would simultaneously support both null hypotheses (c): 20% of people are regular consumers of Hi-Brow and (d): 25% of people are regular consumers of Hi-Brow?

Chapter 28: The chi-square test

Exercises from Set A and B will be covered on Quiz 6, May 21.

2. Set A, p 531: #1-10 (for #3-7, 9 use Excel if desired)
3. Set B, p 535: #3
4. Set C, p 539: #1-7
5. Review exercises, p 540: #1-3, 5-10

1. From Johnson and Bhattacharyya, Statistics: Principles and Methods: Out of 100 people who volunteered to donate blood, the frequency of blood types was as follows: 40 O, 44 A, 10 B, and 6 AB. Use the χ2 method to test the hypothesis that (a) all four blood types are equally distributed in the population of potential donors, and (b) O and A are each four times as common as B or AB. Use Excel if you like. Can you conclude anything about the relative probability of these hypotheses?
2. A market research firm sends surveys to 900 firms, 300 each of small, medium, and large sized. The number of surveys returned was 200 for the small firms, 175 for the medium-sized firms, and 155 for the large firms. Are response rate and size of company independent?
3. From Moore, McCabe, and Craig, Introduction to the Practice of Statistics: Some data from the early 1990s on on-time and delayed flights by two airlines at two airports is reported below. (a) In terms of delay, is there a real difference between Alaska Airlines and America West? (b) In terms of delay, is there a real difference between Los Angeles and Phoenix?
Los AngelesPhoenix
On Time  Delayed    On Time  Delayed
America West6941174840415

Chapter 27: More tests for averages

1. Set A, p 503: #1-5
2. Set B, p 506: #1-9
3. Skip
4. Skip
5. Skip
6. Review exercises, p 518: #1-3

You may find some exercises from Chapter 23 useful as well. I especially encourage you to read Section 3, "Which SE?" and do the Set C exercises.

1. Set A, p 413: #1-5, 9
2. Set B, p 420: #1, 6
3. Set C, p 423: #1-3, 6
4. Set D, p 424: #6, 7
5. Review exercises, p 425: #1, 2a, 3a, 10a

• Chapter 21 Section 6 review, p 391: #1-5, 7-15 (slightly extends listing above)
• Chapter 22 Section 8 review, p 405: #5, 6, 10-12
• Chapter 23 Section 6 special review, p 428: #4-6, 8-18, 21-23, 26, 27, 29, 30
For #13 interpret "an" as "at least one", with an inclusive "or"; note that #26 is a z-test in disguise.
• Chapter 29 Section 1 Set A, p 546: #1, 2; Section 3 Set C, p 554: #1, 4, 5, 7; Section 7 review, p 563: #1, 2, 4
• Chapter 29 Section 8 special review to end all reviews, p 565: #5-8, 9b, 10, 11, 13-16, 20, 21, 25, 26, 34, 37.

Supplemental Review Exercises:

1. A computer program is written to simulate two draws with replacement from a box containing four tickets numbered 1 through 4. 1600 runs of the program result in the following frequencies for the possible sequences of draws, which seem not quite right to one of the programmers:
 draw freq draw freq draw freq draw freq 1 1 93 1 2 100 1 3 91 1 4 105 2 1 108 2 2 102 2 3 104 2 4 82 3 1 89 3 2 86 3 3 114 3 4 120 4 1 100 4 2 97 4 3 94 4 4 115
1. Check the distribution of values on the first draw.
2. Check the distribution of values on the second draw.
3. Check the distribution of values on second draw for each first draw value.
4. State your results in terms of (conditional) probability: what seems to be wrong with the program? (this answer will be hidden so if you don't want to do the first three parts you can read their answers and still try this one)
2. Deal five cards from a well-shuffled deck. Let event A be that the first card is a face cards and the remaining cards are half black and half red. Let event B be that the first card is a heart and the remaining cards are all non-face cards (A-10) in diamonds and spades. What is the probability of the event A or B?
3. For a certain large group of exercisers, the duration of each workout and the number of workouts per week are slightly negatively correlated. The average duration is 1.5 hours and the average frequency is 2.75 times per week, with SDs of 0.5 and 1, respectively. The correlation is r = -0.3 and the data is homoscedastic.
1. Approximately what percentage of the exercisers who work out for 1.5 hours at a time work out at least three times per week?
2. Approximately what percentage of the exercisers who work out twice a week work out for no more than 1 hour at a time?
3. If you choose an exerciser from the group at random, what is the probability that
1. he or she works out at least three times a week?
2. he or she works out at most one hour at a time?
4. For a recent graduating class of over one million high school students, the ACT and SAT scores were essentially normally distributed with ACT mean 20.8 and standard deviation 4.8 points, and SAT mean 1026 with standard deviation 209 points (the ACT is on a scale from 1 to 36 and the SAT from 400 to 1600).
1. If ACT and SAT scores are perfectly positively correlated, what score on the SAT is equivalent to a score of 30 on the ACT?
2. If ACT and SAT scores are perfectly positively correlated, what score is better, 1240 on the SAT or 25 on the ACT?
3. What is the 80th percentile for ACT scores?

Examples from Class: (answers only, fully worked out)

A simple random sample of 400 people from a population of 100,000 is surveyed in the fall, and another sample the following spring. The fall sample showed 41% of people in favor of repaving downtown; the average time per week spent downtown was 4 hours with a standard deviation of 2 hours. In the spring, 45% were in favor of repaving and the average weekly time spent downtown was 3.5 hours with a standard deviation of 1.75 hours. For each of the two things measured, find (a) a 95% confidence interval for the population parameter for the fall survey, and (b) whether it appears there was a real change from the fall to the spring.

1. Roll a die five times. What is the probability of getting a 1 first, and then half even numbers and half odd numbers?
2. Deal four cards from a well-shuffled deck. What is the probability of getting exactly one heart and exactly one face card?
3. Roll six dice. What is the probability of getting at least one 3 or 4?

A person claiming to have invented a new source of energy releases data from successive trial runs of the machine. The data, converted into standard units via the average power produced and the standard deviation of the values, is presented in terms of unit intervals below. One would expect the amounts to be normally distributed; at this level of detail do they appear to be?

zfreq
< -22
-2 to -19
-1 to 025
0 to 128
1 to 212
> 22

Back to the main Math 10 page