Thursday, July 28, 2016

Unizor - Statistics - Correlation Problem 2





Notes to a video lecture on http://www.unizor.com


Statistical Correlation -
Problems 2


Let's make an example that show how dependency and correlation are related.
We know that independent variables have zero correlation, while linearly dependent variables have the correlation 1 by absolute value.
Is the inverse true?

Problem A

Make up the values of experiment with two dependent random variables whose sample correlation equals to zero.

Solution

To make such an example, it is sufficient to come up with such sample values of random variables S and T that, on one hand, produce zero correlation, but, on the other hand, do not produce some characteristic property of independent variables. For example, do not produce the equality between conditional and unconditional probabilities of S taking some value when T took some value.

Let's simplify it to a minimum and consider that we have only two possible observations of S(a, b) and three possible observation of T (c, d, x). Possible combinations of their values are:
(a,c), (a,d), (a,x), (b,c), (b,d), (b,x).

Assume, in 100 conducted experiments the combination(a,c) never occurred, (a,d) - 50 times, (a,x) - never, (b,c) - 25 times, (b,d) - never, and (b,x) - 25 times.

It means that, if S=a (with frequency 50/100=0.5), unconditionally takes value d. If, however, S=b (also with frequency 50/100=0.5), T can take either value c (with frequency 25/100=0.25) or x(also with frequency 25/100=0.25).

Values of random variables Sand T and numbers of times they occur are in a table below.
T=cT=dT=xΣ(S)
S=a050050
S=b2502550
Σ(T)255025

To satisfy the requirement of zero correlation we have to make sure that their covariance is zero, that is
E(S·T) =
= (50ad+25bc+25bx)/100=
E(S)·E(T) =
= [(50a+50b)/100]·
·[(25c+50d+25x)/100]


To simplify it even further, let's assign some concrete values to variables a, b, c, d and find the value of x from the equation above.
Set a=2, b=4, c=8, d=16.

New values of random variables S and T and numbers of times they occur are in this new table below.
T=8T=16T=xΣ(S)
S=2050050
S=42502550
Σ(T)255025


Then our equation would look like this:
E(S·T) = 24+x =
E(S)·E(T) = 30+0.75x


Solving this equation leads to the following value for an unknown x:
x=24

So, for x=24 the covariance of our dependent random variables S and T equals to zero.
This proves that, while independence implies covariance = 0, the inverse is not true. There are dependent random variables with covariance = 0.

No comments: