*Notes to a video lecture on http://www.unizor.com*

__Statistical Correlation -__

Problems 3

Problems 3

To establish the usefulness of a new vaccine, the following statistical data were collected.

Out of 2000 people 1000 randomly chosen people were inoculated with a vaccine and the other 1000 received placebo.

There were 50 cases of illness related to this virus among inoculated and 300 among people who received placebo.

Does this vaccine work?

*Solution*

Let's put our information into a table.

Sick | Not Sick | Total | |

Vaccine | 50 | 950 | 1000 |

Placebo | 300 | 700 | 1000 |

Total | 350 | 1650 | 2000 |

Consider a random variable

*ξ*taking values

*1*and

*0*for each person depending on whether this person was inoculated (

*ξ=1*) or not (

*ξ=0*). Since 1000 randomly chosen people out of 2000 were inoculated, the probability of

*ξ*to take value 1 is

*1000/2000=0.5*and probability of it to take value 0 is the same

*0.5*.

Consider a random variable

*η*taking values

*1*and

*0*for each person depending on whether this person got sick (

*η=1*) or not (

*ξ=0*). Since 350 people out of 2000 got sick, the probability of

*η*to take value 1 is

*350/2000=0.175*and probability of it to take value 0 is

*1−0.175=0.825*.

If vaccine does not help to resist a virus, inoculation random variable

*ξ*and getting sick random variable

*η*are supposed to be

*independent*random variables and their correlation should be equal to 0. If vaccine works well, correlation should be negative since vaccination (

*ξ=1*) and sickness (

*η=1*) are opposite to each other.

In any case, it's interesting to find out the value of statistical correlation.

= 1·50/2000+0·950/20000+

+0·300/2000+0·700/20000 =

=0.025

**E**(ξ·η) == 1·50/2000+0·950/20000+

+0·300/2000+0·700/20000 =

=0.025

= 1·0.5 + 0·0.5 =

= 0.5

**E**(ξ) == 1·0.5 + 0·0.5 =

= 0.5

= (1−0.5)²·0.5 + (0−0.5)²·0.5 =

= 0.25

**Var**(ξ) == (1−0.5)²·0.5 + (0−0.5)²·0.5 =

= 0.25

= 1·0.175 + 0·0.825 =

= 0.175

**E**(η) == 1·0.175 + 0·0.825 =

= 0.175

= (1−0.175)²·0.175 +

+ (0−0.175)²·0.825 =

= 0.144375

**Var**(η) == (1−0.175)²·0.175 +

+ (0−0.175)²·0.825 =

= 0.144375

=

= 0.025−0.5·0.175 =

= −0.0625

**Cov**(ξ,η) ==

**E**(ξ·η)−**E**(ξ)·**E**(η) == 0.025−0.5·0.175 =

= −0.0625

=

= -0.0625

≅ −0.328976

**R**(ξ,η) ==

**Cov**(ξ,η)**/**√**Var**(ξ)·**Var**(η) == -0.0625

**/**√0.25·0.144375 ≅≅ −0.328976

As we see, the non-zero correlation exist. It is noticeable but not a very strong correlation, and it is negative, which means that increased value of one random variable is related to decreased value of another, that is vaccination is correlated to not getting sick and non-vaccination correlates with getting sick, as expected.

Consider now two extreme cases.

*Case A*

If the number of sick people in the vaccinated group is proportional to a number of sick people among whole observed population, we should assume that vaccine has no affect and random variables

*ξ*that reflects inoculation of a person and

*η*that reflects his health status are independent. Let's determine what should the number of inoculated sick people

*x*should be:

*x/1000 = 350/2000*

*x = 175*.

The table of results looks now as

Sick | Not Sick | Total | |

Vaccine | 175 | 825 | 1000 |

Placebo | 175 | 825 | 1000 |

Total | 350 | 1650 | 2000 |

Then

= 1·175/2000+0·825/2000+

+0·175/2000+0·825/2000 =

=0.0875

**E**(ξ·η) == 1·175/2000+0·825/2000+

+0·175/2000+0·825/2000 =

=0.0875

= 1·0.5 + 0·0.5 =

= 0.5

**E**(ξ) == 1·0.5 + 0·0.5 =

= 0.5

= 1·0.175 + 0·0.825 =

= 0.175

**E**(η) == 1·0.175 + 0·0.825 =

= 0.175

=

= 0.0875−0.5·0.175 = 0

**Cov**(ξ,η) ==

**E**(ξ·η)−**E**(ξ)·**E**(η) == 0.0875−0.5·0.175 = 0

Since

*covariance*is zero,

*correlation*is zero as well.

Zero correlation indicates that there is no noticeable effect of vaccine.

*Case B*

If the number of sick people in the vaccinated group is zero, we should assume that vaccine is more effective than if there were 50 sick people as in the main problem, and random variables

*ξ*that reflects inoculation of a person and

*η*that reflects his health status are more related to each other. The table of results looks now as

Sick | Not Sick | Total | |

Vaccine | 0 | 1000 | 1000 |

Placebo | 350 | 650 | 1000 |

Total | 350 | 1650 | 2000 |

= 1·0/2000+0·1000/2000+

+0·350/2000+0·650/2000 =

=0

**E**(ξ·η) == 1·0/2000+0·1000/2000+

+0·350/2000+0·650/2000 =

=0

= 1·0.5 + 0·0.5 =

= 0.5

**E**(ξ) == 1·0.5 + 0·0.5 =

= 0.5

= 1·0.175 + 0·0.825 =

= 0.175

**E**(η) == 1·0.175 + 0·0.825 =

= 0.175

=

= 0−0.5·0.175 = -0.0875

**Cov**(ξ,η) ==

**E**(ξ·η)−**E**(ξ)·**E**(η) == 0−0.5·0.175 = -0.0875

= (1−0.5)²·0.5 + (0−0.5)²·0.5 =

= 0.25

**Var**(ξ) == (1−0.5)²·0.5 + (0−0.5)²·0.5 =

= 0.25

= (1−0.175)²·0.175 +

+ (0−0.175)²·0.825 =

= 0.144375

**Var**(η) == (1−0.175)²·0.175 +

+ (0−0.175)²·0.825 =

= 0.144375

=

= -0.0875

≅ −0.460566

**R**(ξ,η) ==

**Cov**(ξ,η)**/**√**Var**(ξ)·**Var**(η) == -0.0875

**/**√0.25·0.144375 ≅≅ −0.460566

As we see, the correlation is stronger than when there were 50 sick people among vaccinated. It is still negative, which means that increased value of one random variable is related to decreased value of another, that is vaccination is correlated to not getting sick and non-vaccination correlates with getting sick, as expected.

The correlation did not reach the value of −1, which would indicate absolute rigid dependency of not getting sick to vaccination. The reason is that there are other factors for not getting sick that resulted in non-vaccinated people to stay healthy - exposure to a virus and immune system. If all non-vaccinated people got sick and all vaccinated people stayed healthy, the correlation would have been −1. Check it!

## No comments:

Post a Comment