statistics

statistics

CONTIGENCY TABLES

Assignment 1

Identifying Risks and Hazards—Part 1

Instructions

Open SPSS and then use the file/open/data command to open the dataset asbestos.sav. This is data collected on the incidence of lung cancer for those exposed to
asbestos and those not exposed.

Run a chi-square for independence and odds ratio analysis. Then, provide your response to all items in this worksheet.

Note: You will use the SPSS Output generated in this Assignment 1 to complete Assignment 2 as well.

———————————————————————————————————————

Step 1: Set up your hypothesis and determine the level of significance
1. State the alternative/research hypothesis in written format.

Answer: ____ / 6 points

2. State the null hypothesis in written format

Answer: ____ / 6 points

3. What is your level of significance (choose α = .05 or α = .01)?

Answer: ____ / 4 points
Step 2: Select the appropriate test statistic

4. What is the appropriate test statistic?

Answer: ____ / 4 points

-Continued below-
Step 3: Set up the decision rule

5. Based on the level of significance you set in step 1 and test statistic in step 4, what is your decision rule?

Note: Because there are 2 rows and 2 columns, the total degrees of freedom is 1.

Answer: ____ / 6 points

Step 4: Compute the test statistic

· Open SPSS and then use the file/open/data command to open the dataset asbestos.sav.

· Select: Analyze > Descriptive Statistics > Crosstabs

· Move asbestos to the Row(s) box, and lungca to the Column(s) box. Click on Statistics and make sure Chi-Square and RISK are checked. Click on Cells, and
check Observed and Expected (frequencies). Click Continue; click OK.

6. What is the χ2 value computed by your SPSS analysis?

Answer: ____ / 4 points

7. Paste your SPSS output here:
Answer: ____ / 20 points

Step 5: Conclusion

8. Do you reject or fail to reject the null hypothesis?
Answer: ____ / 6 points

9. Is there an association between asbestos exposure and lung cancer?

Answer: ____ / 4 points

ASSIGNMENT 2

Instructions

You will refer back to and use the SPSS output generated in Assignment 1 for this assignment as well. This output is based on the asbestos.sav dataset that related to
the incidence of lung cancer for those exposed to asbestos and those not exposed. Be sure to refer back to that output and then provide your response to all items in
this worksheet.

———————————————————————————————————————

1. From the SPSS output generated in this week’s Assignment 1, copy only your odds ratio analysis (“Risk Estimate”) portion of the output and paste, below:
Answer: ____ / 10 points

2. The p-value associated with a chi-square test only suggests whether or not the results are statistically significant. Why is it important to also look at the
odds ratio?

Answer: ____ / 10 points

3. What does the odds ratio value in the SPSS output tell you, specifically, about lung cancer and exposure to asbestos?

Answer: ____ / 10 points

4. Based on your answer above, would you say there is a strong association between asbestos exposure and lung cancer?

Answer: ____ / 10 points

5. From the SPSS output generated in this week’s Assignment 1, copy only your “asbestos * lung cancer Crosstabulation” portion of the output and paste, below:
Answer: ____ / 10 points

6. Using the formula provided in this week’s Learning Resources, and the data in the cross-tabulation output, calculate the odds ratio and show your work in your
answer, below.

Answer: ____ / 10 points
INSTRUCTIONS

Step-by-Step Guide for calculating odds ratios and risk ratios

This Step-by-Step Guide demonstrates how to calculate odds ratios, risk ratios, cumulative incidence, incident density, and prevalence.

Odds Ratio (OR):

The most common use of probability or odds in Public Health is the odds ratio. This is calculated from a simple 2 x 2 table set up as follows:

Exposure to variable of interest

Existence of Disease designated as case (yes disease) or control (no)

Case

Control

Yes exposed

a

b

All Exposed (a + b)

No not exposed

c

d

All Not exposed (c + d)

Total

All Cases (a + c)

All Controls (b + d)

Total sample

We can take this table and fill in the values from page 29 in our text:

Exposure to Tobacco Smoke

Existence of Cancer case (yes disease) or control (no)

Case

Control

Yes exposed (smoker)

40

29

All Exposed (69)

No not exposed

10

21

All Not exposed (31)

Total

All Cases (50)

All Controls (50)

Total sample

You can observe several things immediately about this 2 x 2 table. It is divided into an equal number of cases and controls and so is based on a case-control study
design (you will learn more about this in Epidemiology).

The odds ratio estimates relative risk when you only have a sample to work with as in a case-control study. The formula for the odds ratio is the odds of disease in
the exposed divided by the odds of disease in the non-exposed. Using the letters from the table this is:

(a/b) / (c/d) or with the numbers it is: (40/29)/(10/21) = 1.379 / .476 = 2.897

A common shortcut to this calculation is multiplying (a x d) and then dividing this by (b x c).

The interpretation is that smokers in this sample are 2.9 times as likely to get cancer as non-smokers.

Risk Ratio or Relative Risk (RR):

Risk ratios also use 2 x 2 tables but they are mostly based on prospective studies and so the cases and controls are not evenly divided. They provide a more accurate
assessment of the relative risks of disease based on exposure. The 2 x 2 table is set up as above but the calculation for RR is different from that for OR.

The formula for relative risk is based on a comparison of the risk to the exposed to the risk to the unexposed. The risk to the exposed is Number exposed who are
ill/Number exposed. The risk to the unexposed is Number unexposed who are ill/Number unexposed.

So RR = Risk in exposed /Risk in unexposed. Using the 2 x 2 this is [a/(a + b)]/[c/(c + d)]. When we use the numbers from our table it is: (40/69)/(10/31) =
0.58/0.322 = 1.8

Note that this is a much lower estimate than that of odds ratio and assumes that all the people in the prospective study were observed for the same length of time.

Here is a quick comparison of Odds Ratio and Relative Risk.

Odds Ratio (OR)

Relative Risk (RR)

Case control studies

Cohort studies

Focus is on the exposure of interest

Focus is on disease occurrence

Best when disease is rare and exposure is more common

Best when exposure of interest is rare and disease more common

Using 2×2 table (a x d)/( b x c)

Using 2×2 table (a/a + b)/(c/c + d)
Cumulative Incidence (CI), Incidence Density (ID), and Prevalence (P):

There is often confusion over these so you need to use care with the definitions.

Incidence versus Prevalence:

To understand the difference between incidence and prevalence you need to identify a new case from an existing one. When you consider the prevalence of something you
are taking a snapshot of the current situation. You are not looking at when the disease occurred or how long it has been present. You are simply dividing the number of
people with a given disease or characteristic by the total number of people you are observing at that time.

Prevalence = number with trait / total number

For example, if you wanted to know the prevalence of brown eyes in a room full of people you would count the number of people with brown eyes and divide this number by
the number of people in the room. With incidence you are only looking for new instances of that characteristic. It is unlikely that anyone will develop brown eyes
while you are observing the group so for this example we will use diabetes. Whereas prevalence is the number with diabetes compared to the total at any given time, the
incidence is the number who develop diabetes compared to the total while under observation. This introduces the element of time.

Cumulative Incidence and Incidence Density (aka Density Incidence):

The difference between these is how the time is handled. In cumulative incidence a group of people is observed during a set period of time and the number of people who
develop a characteristic is divided by the number of people available to develop that characteristic. Cumulative Incidence = number of new cases of a disease or trait/
total number of people at risk for the disease or trait. The denominator excludes people who are not at risk for developing the trait such as those who already have it
or who are immune. The time period for observation is most often a year. This is often expressed as the future risk of developing a disease. An example is the
prediction that there will be 200 new cases of diabetes diagnosed in a given town in 2011. This is based on the actual number of new cases observed in 2010 adjusted
for changes in the size of the town’s population since then.

In Incidence Density, the individuals in the groups are not all observed for the same length of time and so the numerator remains the number of new cases but the
denominator includes not just the number of people at risk but the time they have been observed.

Incidence Density = number of new cases of a disease or trait/ total number of people at risk for the disease or trait multiplied by the amount of time they were each
observed. The denominator for this is expressed as person-time and is also often measured in years, but will be different for each individual. This is often used when
looking at people born at different times. Consider a study conducted in 2000that looks at the medical history of 100 people since their birth. In each case, they
either developed the disease (cancer) over their lifetime or they did not. Each individual in the study was born in a different year, so each individual contributes a
different amount to the denominator. You are probably very familiar with Life Insurance mortality tables, which use this type of calculation to determine the risk of
death (lifespan) to an individual assuming they have already reached a certain age.

Some sample calculations of Prevalence, Cumulative Incidence, and Incidence Density:

You have the following data:

There is a group of people you have been observing for 2 years to see who develops diabetes. The group started with 200 people, but in the 2 years, several of them
have moved away, died of another disease, or been diagnosed with diabetes. At the end of the study there were 160 people left in your group. In Year 1, 10 of the
people in the group developed diabetes, while in Year 2, 12 people developed diabetes. Assume that there were no cases of diabetes in the group at the beginning of
the 2 years of observation and that all people were at risk of developing the disease.

To calculate the prevalence of diabetes at the end of Year 1, you would divide 10 by 200.

P(end of Year 1) = 10/200 = 5% assuming there were no losses from the group in that first year

At the end of Year 2, the prevalence would be (10 (from Year 1)+ 12 (from Year 2))/160

P(end of Year 2) = 22/160 = 13.75%

To calculate the Cumulative Incidence in Year 1, you would again look at the cases from the first year. You will note that the new cases are equal to the total cases
after the first year of observation, as there were no cases when you started the observation.

CI(end of Year 1) = 10 new cases/200 total at risk in Year 1 = 5% So in this situation CI and P are the same.

In Year 2, however, you are only looking at new cases divided by the number at risk at the start of Year 2. Since 10 people were diagnosed by the start of Year 2, the
total number still at risk of developing the disease is 200 – 10 = 190.

CI(end of Year 2) = 12 new cases/190 total at risk in Year 2 = 6.3% note that the CI does not consider changes in the number of people at risk due to dropouts but
does consider the reduced number due to those already diagnosed.

The most challenging, but also most accurate, calculation uses the Incidence density. For this you need to add all of the person years contributing to the calculation
of risk. At the end of Year 1, the only loss to follow up was based on the 10 diagnosed, so 190 people each contributed 1 person-year for a total of 190, the 10
diagnosed were diagnosed at different times during the year and so each contribute a fraction of the year to the total. The easiest way to do this is with a table:

Number diagnosed

When diagnosed

Multiplier based on time of diagnosis

Total Person-Years

4

1st quarter

.25

1

2

2nd quarter

.50

1

4

3rd quarter

.75

3

10

5

This adds 5 additional person-time years to the denominator for Year 1, so

ID = 10 new cases/195 person-time years at risk = 5.1%

During the second year, 12 additional people were diagnosed and a total of 40 were lost to either diagnosis or follow-up, so you had 150 people left to contribute one
person-time year and 40 people to contribute a partial year. Again, you can use a table to calculate their contribution to the total:

Number diagnosed or lost to follow-up

When diagnosed or lost

Multiplier based on time of diagnosis or loss

Total Person-Years

20

1st quarter

.25

5

8

2nd quarter

.50

4

12

3rd quarter

.75

9

40

18

Number left contributing 1 person year = 150

The total person-time years at risk for this calculation is 150 + 18 = 168.
So ID = 12 new cases/168 person-time years at risk = 7.1%

To sum it up:

Year 1:

Prevalence = 5%

Cumulative Incidence = 5%

Incidence Density = 5.1%

Year 2:

Prevalence = 13.75%

Cumulative Incidence = 6.3%

Incidence Density = 7.1%

REFERENCES

http://www.pnas.org/content/28/3/94.full.pdf
Gerstman, B. B. (2015). Basic biostatistics: statistics for public health practice (2nd ed., Custom Laureate Edition). Sudbury, Mass.: Jones and Bartlett Learning.
Assignment 1
Identifying Risks and Hazards—Part 1

Instructions
Open SPSS and then use the file/open/data command to open the dataset asbestos.sav. This is data collected on the incidence of lung cancer for those exposed to
asbestos and those not exposed.
Run a chi-square for independence and odds ratio analysis. Then, provide your response to all items in this worksheet.

Note: You will use the SPSS Output generated in this Assignment 1 to complete Assignment 2 as well.

———————————————————————————————————————
Step 1: Set up your hypothesis and determine the level of significance

1. State the alternative/research hypothesis in written format.
Answer: ____ / 6 points

2. State the null hypothesis in written format
Answer: ____ / 6 points

3. What is your level of significance (choose α = .05 or α = .01)?
Answer: ____ / 4 points
Step 2: Select the appropriate test statistic

4. What is the appropriate test statistic?
Answer: ____ / 4 points

-Continued below-
Step 3: Set up the decision rule

5. Based on the level of significance you set in step 1 and test statistic in step 4, what is your decision rule?
Note: Because there are 2 rows and 2 columns, the total degrees of freedom is 1.

Answer: ____ / 6 points

Step 4: Compute the test statistic
• Open SPSS and then use the file/open/data command to open the dataset asbestos.sav.
• Select: Analyze > Descriptive Statistics > Crosstabs
• Move asbestos to the Row(s) box, and lungca to the Column(s) box. Click on Statistics and make sure Chi-Square and RISK are checked. Click on Cells, and check
Observed and Expected (frequencies). Click Continue; click OK.

6. What is the χ2 value computed by your SPSS analysis?
Answer: ____ / 4 points

7. Paste your SPSS output here:
Answer: ____ / 20 points

Step 5: Conclusion

8. Do you reject or fail to reject the null hypothesis?
Answer: ____ / 6 points

9. Is there an association between asbestos exposure and lung cancer?
Answer: ____ / 4 points

ASSIGNMENT 2

Instructions
You will refer back to and use the SPSS output generated in Assignment 1 for this assignment as well. This output is based on the asbestos.sav dataset that related to
the incidence of lung cancer for those exposed to asbestos and those not exposed. Be sure to refer back to that output and then provide your response to all items in
this worksheet.

———————————————————————————————————————

1. From the SPSS output generated in this week’s Assignment 1, copy only your odds ratio analysis (“Risk Estimate”) portion of the output and paste, below:
Answer: ____ / 10 points

2. The p-value associated with a chi-square test only suggests whether or not the results are statistically significant. Why is it important to also look at the
odds ratio?
Answer: ____ / 10 points

3. What does the odds ratio value in the SPSS output tell you, specifically, about lung cancer and exposure to asbestos?
Answer: ____ / 10 points

4. Based on your answer above, would you say there is a strong association between asbestos exposure and lung cancer?
Answer: ____ / 10 points
5. From the SPSS output generated in this week’s Assignment 1, copy only your “asbestos * lung cancer Crosstabulation” portion of the output and paste, below:
Answer: ____ / 10 points

6. Using the formula provided in this week’s Learning Resources, and the data in the cross-tabulation output, calculate the odds ratio and show your work in your
answer, below.
Answer: ____ / 10 points

Step-by-Step Guide for calculating odds ratios and risk ratios

This Step-by-Step Guide demonstrates how to calculate odds ratios, risk ratios, cumulative incidence, incident density, and prevalence.
Odds Ratio (OR):
The most common use of probability or odds in Public Health is the odds ratio. This is calculated from a simple 2 x 2 table set up as follows:
Exposure to variable of interest Existence of Disease designated as case (yes disease) or control (no)
Case Control
Yes exposed a b All Exposed (a + b)
No not exposed c d All Not exposed (c + d)
Total All Cases (a + c) All Controls (b + d) Total sample

We can take this table and fill in the values from page 29 in our text:
Exposure to Tobacco Smoke Existence of Cancer case (yes disease) or control (no)
Case Control
Yes exposed (smoker) 40 29 All Exposed (69)
No not exposed 10 21 All Not exposed (31)
Total All Cases (50) All Controls (50) Total sample

You can observe several things immediately about this 2 x 2 table. It is divided into an equal number of cases and controls and so is based on a case-control study
design (you will learn more about this in Epidemiology).
The odds ratio estimates relative risk when you only have a sample to work with as in a case-control study. The formula for the odds ratio is the odds of disease in
the exposed divided by the odds of disease in the non-exposed. Using the letters from the table this is:
(a/b) / (c/d) or with the numbers it is: (40/29)/(10/21) = 1.379 / .476 = 2.897
A common shortcut to this calculation is multiplying (a x d) and then dividing this by (b x c).
The interpretation is that smokers in this sample are 2.9 times as likely to get cancer as non-smokers.
Risk Ratio or Relative Risk (RR):
Risk ratios also use 2 x 2 tables but they are mostly based on prospective studies and so the cases and controls are not evenly divided. They provide a more accurate
assessment of the relative risks of disease based on exposure. The 2 x 2 table is set up as above but the calculation for RR is different from that for OR.
The formula for relative risk is based on a comparison of the risk to the exposed to the risk to the unexposed. The risk to the exposed is Number exposed who are
ill/Number exposed. The risk to the unexposed is Number unexposed who are ill/Number unexposed.
So RR = Risk in exposed /Risk in unexposed. Using the 2 x 2 this is [a/(a + b)]/[c/(c + d)]. When we use the numbers from our table it is: (40/69)/(10/31) =
0.58/0.322 = 1.8
Note that this is a much lower estimate than that of odds ratio and assumes that all the people in the prospective study were observed for the same length of time.
Here is a quick comparison of Odds Ratio and Relative Risk.
Odds Ratio (OR) Relative Risk (RR)
Case control studies Cohort studies
Focus is on the exposure of interest Focus is on disease occurrence
Best when disease is rare and exposure is more common Best when exposure of interest is rare and disease more common
Using 2×2 table (a x d)/( b x c) Using 2×2 table (a/a + b)/(c/c + d)

Cumulative Incidence (CI), Incidence Density (ID), and Prevalence (P):
There is often confusion over these so you need to use care with the definitions.
Incidence versus Prevalence:
To understand the difference between incidence and prevalence you need to identify a new case from an existing one. When you consider the prevalence of something you
are taking a snapshot of the current situation. You are not looking at when the disease occurred or how long it has been present. You are simply dividing the number of
people with a given disease or characteristic by the total number of people you are observing at that time.
Prevalence = number with trait / total number
For example, if you wanted to know the prevalence of brown eyes in a room full of people you would count the number of people with brown eyes and divide this number by
the number of people in the room. With incidence you are only looking for new instances of that characteristic. It is unlikely that anyone will develop brown eyes
while you are observing the group so for this example we will use diabetes. Whereas prevalence is the number with diabetes compared to the total at any given time, the
incidence is the number who develop diabetes compared to the total while under observation. This introduces the element of time.
Cumulative Incidence and Incidence Density (aka Density Incidence):
The difference between these is how the time is handled. In cumulative incidence a group of people is observed during a set period of time and the number of people who
develop a characteristic is divided by the number of people available to develop that characteristic. Cumulative Incidence = number of new cases of a disease or trait/
total number of people at risk for the disease or trait. The denominator excludes people who are not at risk for developing the trait such as those who already have it
or who are immune. The time period for observation is most often a year. This is often expressed as the future risk of developing a disease. An example is the
prediction that there will be 200 new cases of diabetes diagnosed in a given town in 2011. This is based on the actual number of new cases observed in 2010 adjusted
for changes in the size of the town’s population since then.
In Incidence Density, the individuals in the groups are not all observed for the same length of time and so the numerator remains the number of new cases but the
denominator includes not just the number of people at risk but the time they have been observed.
Incidence Density = number of new cases of a disease or trait/ total number of people at risk for the disease or trait multiplied by the amount of time they were each
observed. The denominator for this is expressed as person-time and is also often measured in years, but will be different for each individual. This is often used when
looking at people born at different times. Consider a study conducted in 2000that looks at the medical history of 100 people since their birth. In each case, they
either developed the disease (cancer) over their lifetime or they did not. Each individual in the study was born in a different year, so each individual contributes a
different amount to the denominator. You are probably very familiar with Life Insurance mortality tables, which use this type of calculation to determine the risk of
death (lifespan) to an individual assuming they have already reached a certain age.
Some sample calculations of Prevalence, Cumulative Incidence, and Incidence Density:
You have the following data:
There is a group of people you have been observing for 2 years to see who develops diabetes. The group started with 200 people, but in the 2 years, several of them
have moved away, died of another disease, or been diagnosed with diabetes. At the end of the study there were 160 people left in your group. In Year 1, 10 of the
people in the group developed diabetes, while in Year 2, 12 people developed diabetes. Assume that there were no cases of diabetes in the group at the beginning of
the 2 years of observation and that all people were at risk of developing the disease.
To calculate the prevalence of diabetes at the end of Year 1, you would divide 10 by 200.
P(end of Year 1) = 10/200 = 5% assuming there were no losses from the group in that first year
At the end of Year 2, the prevalence would be (10 (from Year 1)+ 12 (from Year 2))/160
P(end of Year 2) = 22/160 = 13.75%
To calculate the Cumulative Incidence in Year 1, you would again look at the cases from the first year. You will note that the new cases are equal to the total cases
after the first year of observation, as there were no cases when you started the observation.
CI(end of Year 1) = 10 new cases/200 total at risk in Year 1 = 5% So in this situation CI and P are the same.
In Year 2, however, you are only looking at new cases divided by the number at risk at the start of Year 2. Since 10 people were diagnosed by the start of Year 2, the
total number still at risk of developing the disease is 200 – 10 = 190.
CI(end of Year 2) = 12 new cases/190 total at risk in Year 2 = 6.3% note that the CI does not consider changes in the number of people at risk due to dropouts but
does consider the reduced number due to those already diagnosed.
The most challenging, but also most accurate, calculation uses the Incidence density. For this you need to add all of the person years contributing to the calculation
of risk. At the end of Year 1, the only loss to follow up was based on the 10 diagnosed, so 190 people each contributed 1 person-year for a total of 190, the 10
diagnosed were diagnosed at different times during the year and so each contribute a fraction of the year to the total. The easiest way to do this is with a table:
Number diagnosed When diagnosed Multiplier based on time of diagnosis Total Person-Years
4 1st quarter .25 1
2 2nd quarter .50 1
4 3rd quarter .75 3
10 5

This adds 5 additional person-time years to the denominator for Year 1, so
ID = 10 new cases/195 person-time years at risk = 5.1%
During the second year, 12 additional people were diagnosed and a total of 40 were lost to either diagnosis or follow-up, so you had 150 people left to contribute one
person-time year and 40 people to contribute a partial year. Again, you can use a table to calculate their contribution to the total:
Number diagnosed or lost to follow-up When diagnosed or lost Multiplier based on time of diagnosis or loss Total Person-Years
20 1st quarter .25 5
8 2nd quarter .50 4
12 3rd quarter .75 9
40 18
Number left contributing 1 person year = 150
The total person-time years at risk for this calculation is 150 + 18 = 168.
So ID = 12 new cases/168 person-time years at risk = 7.1%
To sum it up:

Year 1:
Prevalence = 5%
Cumulative Incidence = 5%
Incidence Density = 5.1%
Year 2:
Prevalence = 13.75%
Cumulative Incidence = 6.3%
Incidence Density = 7.1%
REFERENCES
http://www.pnas.org/content/28/3/94.full.pdf
Gerstman, B. B. (2015). Basic biostatistics: statistics for public health practice (2nd ed., Custom Laureate Edition). Sudbury, Mass.: Jones and Bartlett Learning.

Order from us and get better grades. We are the service you have been looking for.