Wednesday, August 3, 2016

Unizor - Statistics - Correlation Problem 3





Notes to a video lecture on http://www.unizor.com


Statistical Correlation -
Problems 3


To establish the usefulness of a new vaccine, the following statistical data were collected.
Out of 2000 people 1000 randomly chosen people were inoculated with a vaccine and the other 1000 received placebo.
There were 50 cases of illness related to this virus among inoculated and 300 among people who received placebo.
Does this vaccine work?

Solution

Let's put our information into a table.
SickNot SickTotal
Vaccine509501000
Placebo3007001000
Total35016502000

Consider a random variable ξtaking values 1 and 0 for each person depending on whether this person was inoculated (ξ=1) or not (ξ=0). Since 1000 randomly chosen people out of 2000 were inoculated, the probability of ξ to take value 1 is 1000/2000=0.5 and probability of it to take value 0 is the same 0.5.

Consider a random variable ηtaking values 1 and 0 for each person depending on whether this person got sick (η=1) or not (ξ=0). Since 350 people out of 2000 got sick, the probability of η to take value 1 is350/2000=0.175 and probability of it to take value 0 is 1−0.175=0.825.

If vaccine does not help to resist a virus, inoculation random variable ξ and getting sick random variable η are supposed to be independentrandom variables and their correlation should be equal to 0. If vaccine works well, correlation should be negative since vaccination (ξ=1) and sickness (η=1) are opposite to each other.
In any case, it's interesting to find out the value of statistical correlation.

E(ξ·η) =
= 1·50/2000+0·950/20000+
+0·300/2000+0·700/20000 =
=0.025


E(ξ) =
= 1·0.5 + 0·0.5 =
= 0.5


Var(ξ) =
= (1−0.5)²·0.5 + (0−0.5)²·0.5 =
= 0.25


E(η) =
= 1·0.175 + 0·0.825 =
= 0.175


Var(η) =
= (1−0.175)²·0.175 +
+ (0−0.175)²·0.825 =
= 0.144375


Cov(ξ,η) =
E(ξ·η)−E(ξ)·E(η) =
= 0.025−0.5·0.175 =
= −0.0625


R(ξ,η) =
Cov(ξ,η)/Var(ξ)·Var(η) =
= -0.0625/0.25·0.144375 ≅
≅ −0.328976


As we see, the non-zero correlation exist. It is noticeable but not a very strong correlation, and it is negative, which means that increased value of one random variable is related to decreased value of another, that is vaccination is correlated to not getting sick and non-vaccination correlates with getting sick, as expected.

Consider now two extreme cases.

Case A

If the number of sick people in the vaccinated group is proportional to a number of sick people among whole observed population, we should assume that vaccine has no affect and random variables ξ that reflects inoculation of a person and ηthat reflects his health status are independent. Let's determine what should the number of inoculated sick people x should be: x/1000 = 350/2000
x = 175.
The table of results looks now as
SickNot SickTotal
Vaccine1758251000
Placebo1758251000
Total35016502000

Then
E(ξ·η) =
= 1·175/2000+0·825/2000+
+0·175/2000+0·825/2000 =
=0.0875


E(ξ) =
= 1·0.5 + 0·0.5 =
= 0.5


E(η) =
= 1·0.175 + 0·0.825 =
= 0.175


Cov(ξ,η) =
E(ξ·η)−E(ξ)·E(η) =
= 0.0875−0.5·0.175 = 0


Since covariance is zero,correlation is zero as well.
Zero correlation indicates that there is no noticeable effect of vaccine.


Case B

If the number of sick people in the vaccinated group is zero, we should assume that vaccine is more effective than if there were 50 sick people as in the main problem, and random variables ξ that reflects inoculation of a person and ηthat reflects his health status are more related to each other. The table of results looks now as
SickNot SickTotal
Vaccine010001000
Placebo3506501000
Total35016502000

E(ξ·η) =
= 1·0/2000+0·1000/2000+
+0·350/2000+0·650/2000 =
=0


E(ξ) =
= 1·0.5 + 0·0.5 =
= 0.5


E(η) =
= 1·0.175 + 0·0.825 =
= 0.175


Cov(ξ,η) =
E(ξ·η)−E(ξ)·E(η) =
= 0−0.5·0.175 = -0.0875


Var(ξ) =
= (1−0.5)²·0.5 + (0−0.5)²·0.5 =
= 0.25


Var(η) =
= (1−0.175)²·0.175 +
+ (0−0.175)²·0.825 =
= 0.144375


R(ξ,η) =
Cov(ξ,η)/Var(ξ)·Var(η) =
= -0.0875/0.25·0.144375 ≅
≅ −0.460566


As we see, the correlation is stronger than when there were 50 sick people among vaccinated. It is still negative, which means that increased value of one random variable is related to decreased value of another, that is vaccination is correlated to not getting sick and non-vaccination correlates with getting sick, as expected.

The correlation did not reach the value of −1, which would indicate absolute rigid dependency of not getting sick to vaccination. The reason is that there are other factors for not getting sick that resulted in non-vaccinated people to stay healthy - exposure to a virus and immune system. If all non-vaccinated people got sick and all vaccinated people stayed healthy, the correlation would have been −1. Check it!

No comments: