Math 205 FG               Test 1, individual portion SOLUTIONS
Fall 2008                      Dr. Fenton

1. (15 pts.) A study by researchers at Cornell University, done at a highway rest stop in the state of Washington, found that men take an average of 45 seconds to use the restroom and women take an average of 79 seconds.

            What are the variables in this study? What type of variables are they? Which is the explanatory variable and which is the response variable?
            Do you think this is a cause-&-effect situation? Explain why you think this. Include a diagram in your explanation.

The variables studied by these researchers are gender and time spent in the restroom. Gender is a categorical variable and time is quantitative.  In this situation, gender is explanatory and time is response.
            This is cause-&-effect. Biological and cultural differences cause women to take longer.

2. (15 pts.) The histograms here show four sampling distributions intended to estimate the same parameter. Label each as high or low bias. Explain how you decide this.

(Sorry, I cannot reproduce the diagrams electronically.) To decide on bias, you must decide whether the center of the sampling distribution matches the population parameter. Items (a) and (d) show high bias because the center is clearly different from the population parameter. Items (b) and (c) show low bias because there center is close to the parameter. (By the way, this was a homework problem.)

3. (15 pts.) The blood cholesterol levels of middle-aged men are approximately Normal with mean 222 mg/dl and standard deviation 37 mg/dl.

  1. The recommended cholesterol level is 180 mg/dl or below.   What percentage of middle-aged men meet this recommendation? Write down the calculator commands you use to find this percentage.
  2. What percentage of men have a high cholesterol level, above 240 mg/dl? Again, write down the calculator commands you use to find this percentage.
  3. At my last blood test, my cholesterol level was 135 mg/dl. Find my z-score. Was this value in the middle 68% of the distribution? In the middle 95%? Explain how you know.

  1. normalcdf(0,180,222,37) produces 0.128159, so about 13%.
  2. normalcdf(240,1000,222,37) produces 0.313311, so about 31%. The 1000 can be replaced by any large number. In theory, we want the interval from 240 to infinity. To accommodate the calculator, we use a large number instead. (Anything four or five standard deviations above the mean or higher is fine.)
  3. The z-score is (135 – 222) / 37 = -2.35. Since this is more than two standard deviations below the mean, my value does not lie in the middle 68% nor in the middle 95%. However, it does lie in the middle 99.7%.

4. (20 pts.) One of the factors thought to contribute to the incidence of skin cancer is ultraviolet (UV) radiation from the sun. The amount of UV radiation a person receives depends on the person’s latitude. The following table gives the rates of malignant skin cancer (melanoma) and the degrees north latitude for nine locations in the United States.

Degrees north latitude (x)

32.8

33.9

34.1

37.9

40.0

40.8

41.7

42.2

45.0

Melanoma rate per

100,000 people (y)

9.0

5.9

6.6

5.8

5.5

3.0

3.4

3.1

3.8

  1. What is the correlation between the melanoma rate and the degrees of north latitude?
  2. What is the equation for the least-squares regression line for this problem?
  3. The latitude of Louisville is 38.1 degrees north. Using the regression line, predict the melanoma rate for Louisville.
  4. How reliable do you think your prediction is? Explain why.
  1. Put the data into two lists. Calculate LinReg(ax+b).
    The correlation coefficient is r=-0.8573.
  2. y = -0.39875x + 20.55828  (You did not have to show this many decimal places.)
  3. For x=38.1, the regression line predicts y = 5.366 cases per 100,000 people.
  4. This is a pretty reliable prediction, for two reasons: The correlation is strong and 38.1 is in the middle of the x-data (interpolation).

5. (20 pts.) The Praxis Exam is a national exam to test a future teacher's knowledge of the subjects he or she intends to teach. A passing grade is required for certification. The following table shows the number of students who passed the Praxis Exam in 2001 at seventeen of the private institutions in Kentucky.

Institution

Number

Institution

Number

Alice Lloyd

14

Ky. Wesleyan

20

Asbury

46

Lindsey Wilson

21

Bellarmine

91

Midway

9

Berea

11

Pikeville

18

Brescia

12

   Spalding

43

Campbellsville

32

Thomas More

15

Cumberland

35

Transylvania

25

Georgetown

25

Union

37

Ky. Christian

16

  1. Create a stemplot of this data. Then describe the distribution.
  2. Find the five-number summary of this data.
  3. Draw a modified boxplot (with outliers) for the data.
  1. 0 | 9
    1 | 1 2 4 5 6 8
    2 | 0 1 5 5
    3 | 2 5 7
    4 | 3 6
    5 |
    6 |
    7 |
    8 |
    9 | 1
    This distribution is skewed to the right, it is unimodal, and there is an outlier.
  2. The five number summary, which can be done by hand or by the calculator, is
    min=9, Q1=14.5, Med=21, Q3=36, max=91
  3. The boxplot should look basically like this. It should show 91 as an outlier.

boxplot of PRAXIS data

6. (15 pts.) In 1997 federal inspectors found that about forty percent of the cartons of milk in the United States were not filled to the amount stated on the carton. The Milk Industry Foundation, a national trade group, promised efforts to correct this problem. Now we want to know if things have improved.

            For a nation-wide issue, a simple random sample is not a practical way to collect data. Design a more practical sampling procedure to collect data on this question. Give a complete description of your procedure, including the type of sampling design you chose.

There are many correct ways to do this. One approach is to randomly select two grocery stores in each state. Then select a SRS of milk cartons from these stores. This is a multi-stage approach.
            Another possibility is a stratified random sample. One way to do this is to choose a SRS at grocery stores and another SRS at convenience stores. (This is a pretty skimpy description. You should give more detail.)
            It is also legitimate to make the observations at the distributors. However, this still must be a sample of the distributors—it is highly impractical to examine all milk distributors in the U.S.

Test 1, group portion SOLUTIONS

1. (10 pts.) "Pulse rates go up when taken by a member of the opposite sex."
Design an experiment to decide whether this statement is true. Include a description of your experiment and of what observations would be needed. Also include a diagram of your experiment.

There are many correct ways to do this. The critical issues are comparison (experimental group versus control group), random allocation of subjects, and repeating the experiment with many subjects.
Many of you choose to do a matched-pairs experiment, in which each subject is measured twice. In this design, each subject is part of both the experimental group and the control group.

2. (10 pts.) Here is a histogram of the average attendance at NFL stadiums in the 2003-2004 season.

  1. Describe the shape of this distribution.
  2. Estimate the value of the median. Explain how you decide this.
  3. Is the value for the mean larger or smaller than the median? Why do you think so?

histogram of NFL data

    1. This distribution is unimodal, slightly skewed to the left, and there is an outlier.
    2. Since there are 32 stadiums (add up the heights of the bars), the median lies between the 16th and 17th data value. This is somewhere at the low end of the tallest bar. So the median is about 68 thousand.
    3. The outlier on the left (Arizona) will pull the mean downwards. So the mean will be lower than the median.

3. (20 pts.) Shown below are the capacities and enrollments for 2007-2008 at twenty public high schools in Jefferson County.

  1. Draw a scatterplot for this data. Describe your graph.
  2. Give the equation for the regression line and add it to your graph.
  3. Does the school with the highest capacity have the highest enrollment? Explain how you can tell this from the scatterplot.
  4. The school with the lowest capacity is Shawnee High School, which also has the lowest enrollment. Is Shawnee an outlier in this scatterplot?
  5. In 1998, the correlation between a school’s capacity and enrollment was r = 0.745.  What is the correlation coefficient for 2007-2008? What does this new value tell you about the schools?

School

Capacity

Enrollment

School

Capacity

Enrollment

Atherton

1145

1076

Male

1792

1739

Ballard

1710

1733

Manual

1878

1878

Butler

1661

1637

Moore

1000

775

Central

982

968

PRP

1825

1893

Doss

1225

1089

Seneca

1755

1628

Eastern

1975

1916

Shawnee

695

608

Fairdale

884

825

Southern

1342

1219

Fern Creek

1475

1450

Valley

842

827

Iroquois

1242

1166

Waggener

1156

1084

Jeffersontown

1132

1103

Western

938

909

  1. (Your scatterplots were really good!) There is a strong positive association between these variables. We can see this in the upward trend of the data points and in how well these points line up.
  2. The regression line is y = 1.0495x – 122.4789.
  3. Yes, the school with the highest capacity has the highest enrollment. This point is furthest to the right and also the highest vertically.
  4. Shawnee is not an outlier. It lies almost exactly on the regression line, which means it fits the pattern of the data.
  5. The new correlation coefficient is r = 0.9894, a very strong value. This means that the schools’ enrollments match the capacities very closely, much more so than in 1998.

4. (10 pts.) Head Start is a federal program that provides preschool education to children from low-income families. One study of its effectiveness examined 150 male subjects from Harlem, NY once they reached age 21. Here are the results.

Head Start

Not

In college

24

15

Working

43

33

Neither

8

27

Find the conditional distributions for the columns. Use these values to draw graphs comparing the Head Start participants to the others. What differences do you observe?

The conditional distributions for the columns are
            Head Start        Not
            32%                 20%
            57%                 44%
            11%                 36%

These are categorical variables, so the graphs can be bar graphs or pie charts. In these graphs, it is clear that the Head Start participants have higher percentages for the College and Work categories, and were much lower for the Neither category.