## Thursday, July 28, 2016

### Unizor - Statistics - Correlation Problem 2

Notes to a video lecture on http://www.unizor.com

Statistical Correlation -
Problems 2

Let's make an example that show how dependency and correlation are related.
We know that independent variables have zero correlation, while linearly dependent variables have the correlation 1 by absolute value.
Is the inverse true?

Problem A

Make up the values of experiment with two dependent random variables whose sample correlation equals to zero.

Solution

To make such an example, it is sufficient to come up with such sample values of random variables S and T that, on one hand, produce zero correlation, but, on the other hand, do not produce some characteristic property of independent variables. For example, do not produce the equality between conditional and unconditional probabilities of S taking some value when T took some value.

Let's simplify it to a minimum and consider that we have only two possible observations of S(a, b) and three possible observation of T (c, d, x). Possible combinations of their values are:
(a,c), (a,d), (a,x), (b,c), (b,d), (b,x).

Assume, in 100 conducted experiments the combination(a,c) never occurred, (a,d) - 50 times, (a,x) - never, (b,c) - 25 times, (b,d) - never, and (b,x) - 25 times.

It means that, if S=a (with frequency 50/100=0.5), unconditionally takes value d. If, however, S=b (also with frequency 50/100=0.5), T can take either value c (with frequency 25/100=0.25) or x(also with frequency 25/100=0.25).

Values of random variables Sand T and numbers of times they occur are in a table below.
 T=c T=d T=x Σ(S) S=a 0 50 0 50 S=b 25 0 25 50 Σ(T) 25 50 25

To satisfy the requirement of zero correlation we have to make sure that their covariance is zero, that is
E(S·T) =
E(S)·E(T) =
= [(50a+50b)/100]·
·[(25c+50d+25x)/100]

To simplify it even further, let's assign some concrete values to variables a, b, c, d and find the value of x from the equation above.
Set a=2, b=4, c=8, d=16.

New values of random variables S and T and numbers of times they occur are in this new table below.
 T=8 T=16 T=x Σ(S) S=2 0 50 0 50 S=4 25 0 25 50 Σ(T) 25 50 25

Then our equation would look like this:
E(S·T) = 24+x =
E(S)·E(T) = 30+0.75x

Solving this equation leads to the following value for an unknown x:
x=24

So, for x=24 the covariance of our dependent random variables S and T equals to zero.
This proves that, while independence implies covariance = 0, the inverse is not true. There are dependent random variables with covariance = 0.

## Monday, July 25, 2016

### Unizor - Random Variables - Problem 7

Notes to a video lecture on http://www.unizor.com

Random Variables
Problems 7 (Covariance)

As always, try to solve any problems presented on this Web site just by yourself and check against the answers provided.
Only then study the suggested solutions.

Problem

Consider two random variables,ξ and η, not necessarily independent, each taking no more than two different values and having known mutual distribution of probabilities.

Prove that, if covariance between them is zero, then either they are independent random variables or one of them is a constant and can take only one possible value with probability 1.

Solution:

Assume values of ξ are a, b and values of η are c, d.
Let's organize their mutual probabilities into a table:
 η=c η=d ξ=a p q ξ=b u v
So, for instance,
P{ξ=a AND η=c} = p
It's important to notice that
p+q+u+v=1

Having zero covariance implies the following equation:
Cov(ξ,η) = E(ξ·η)−E(ξ)·E(η) = 0
Or, equivalently,
E(ξ·η)=E(ξ)·E(η)

Let's calculate each component separately.
E(ξ) = a(p+q)+b(u+v)
E(η) = c(p+u)+d(q+v)

So, we have an equation:

At this time we suggest to multiply the left side by a multiplier p+q+u+v that is equal to 1 and do the tedious job of opening all the parenthesis, canceling and grouping.

Then, after many transformations we will come up to an equation
(a−b)(c−d)(pv−qu)=0

There are three case when this equation is satisfied.

Case 1 (trivial): a = b
This means, ξ is a constant.

Case 2 (trivial): c = d
This means, η is a constant.

Case 3: pv = qu
This is a more interesting case that will lead us toindependence between ξ and η.
To prove independence, we have to prove that all conditional probabilities for our random variables to take different values equal to corresponding unconditional ones or, equivalently, that the probability of them to take simultaneously some of their values equals to a product of corresponding unconditional probabilities.
Let's prove the latter for ξtaking value of a and η taking value c.
Here is how.
Let's take an obvious equation
1 = p+q+u+v
and multiply it by p:
p = p²+pq+pu+pv
Since we consider a case whenpv = qu, replace pv with qu in the last equation:
p = p²+pq+pu+qu = (p+q)(p+u)

Notice that
p = P(ξ=a AND η=c)
p+q = P(ξ=a)
p+u = P(η=c)

Therefore, we have proven that
P(ξ=a AND η=c) = P(ξ=a) · P(η=c)

Analogously, we can prove all equations needed to proveindependence between ξ and η.

End of proof.

## Friday, July 22, 2016

### Unizor - Statistics - Correlation Problems 1

Notes to a video lecture on http://www.unizor.com

Statistical Correlation -
Problems 1

Assume two different experiments with numerical results performed under the same conditions.
Statistical results of these experiments are used to determine if there is a dependency between them.

For example, we measure the amount of salt added to the cold water (random variable S) and the time it takes to boil it (random variable T).

The results of numerous experiments are in a table, where each row corresponds to an observed value of the first experiment (S), each column corresponds to an observed value of the second experiment (T) and on a crossing of a row and a column there is a number of times when corresponding results of the first and the second experiments occurred.

Using these results, calculate the correlation coefficientbetween the random variables representing these two experiments.

Problem A

Values of random variables Sand T are in a table below.
 T=101 T=102 T=104 S=1 40 0 0 S=2 0 35 0 S=4 0 0 25
Calculate correlation coefficient R(S,T) of these two random variables.

Solution

To calculate correlation coefficient R(S,T) we need to calculate the following mathematical expectations E()and variances Var():
E(S) - mean value of S
E(T) - mean value of T
E(S·T) - mean value of S·T
Var(S) - variance of S
Var(T) - variance of T
and put them into a formula for correlation R(S,T):
R(S,T) =
Cov(S,T) /Var(S)·Var(T)

where
Cov(S,T) = E(S·T)−E(S)·E(S)

S=1 in 40+0+0=40 cases,
S=2 in 0+35+0=35 cases,
S=4 in 0+0+25=25 cases,
Total number of observations
N = 40+35+25 = 100
Therefore, mean
E(S) =
= (1·40+2·35+4·25)/100 = 2.1

T=101 in 40+0+0=40 cases,
T=102 in 0+35+0=35 cases,
T=104 in 0+0+25=25 cases,
Total number of observations is still the same
N = 40+35+25 = 100
Therefore, mean
E(T) =
=(101·40+102·35+104·25)/100
= 102.1

There are only three possible combinations of simultaneous values of S and T:
S·T=1·101=101 in 40+0+0=40cases,
S·T=2·102=204 in 0+35+0=35cases,
S·T=4·104=416 in 0+0+25=25cases,
Total number of observation is still the same
N = 40+35+25 = 100
Therefore, mean
E(S·T) =
=(101·40+204·35+416·25)/100
= 215.8

Now we can calculatecovariance between S and T:
Cov(S,T) =
= 215.8−2.1·102.1 = 1.39

Next is the calculation of variances of and T.

Var(S) =
= [40·(1−2.1)² +
+ 35·(2−2.1)² +
+ 25·(4−2.1)²] / 100 =
= 1.39

Var(T) =
= [40·(101−102.1)² +
+ 35·(102−102.1)² +
+ 25·(104−102.1)²] / 100 =
= 1.39

Correlation coefficient betweenS and T is
R(S,T) = 1.39 / √1.39·1.39 = 1

This result might have been predicted since, obviously, within the framework of our experiments there is a linear dependency between S and T:
T = 100 + S

Problem B

As we know, there is a limit of salt that can be dissolved in water. As we add salt, the concentration of it in water can reach its maximum and new salt is no longer dissolved, it called saturation.
Then the temperature of boiling will no longer increase since concentration of salt will remain the same.

Assume, we make three experiments, as in a problem above, but after the second experiment the water has reached a point of saturation.

Values of random variables Sand T are in a table below.
 T=111 T=112 T=112 S=11 40 0 0 S=12 0 35 0 S=14 0 0 25
Calculate correlation coefficient R(S,T) of these two random variables.

Solution

S=11 in 40+0+0=40 cases,
S=12 in 0+35+0=35 cases,
S=14 in 0+0+25=25 cases,
Total number of observations
N = 40+35+25 = 100
Therefore, mean
E(S) =
= (11·40+12·35+14·25)/100 =
= 12.1

T=111 in 40+0+0=40 cases,
T=112 in 0+35+0=35 cases,
T=112 in 0+0+25=25 cases,
Total number of observations is still the same
N = 40+35+25 = 100
Therefore, mean
E(T) =
=(111·40+112·35+112·25)/100
=(111·40+112·60)/100 =
= 111.6

There are only three possible combinations of simultaneous values of S and T:
S·T=11·111 in 40+0+0=40cases,
S·T=12·112 in 0+35+0=35cases,
S·T=14·112 in 0+0+25=25cases,
Total number of observation is still the same
N = 40+35+25 = 100
Therefore, mean
E(S·T) =
= (11·111·40 +
+ 12·112·35 +
+ 14·112·25)/100 =
= 1350.8

Now we can calculatecovariance between S and T:
Cov(S,T) =
= 1350.8−12.1·111.6 = 0.44

Next is the calculation of variances of and T.

Var(S) =
= [40·(11−12.1)² +
+ 35·(12−12.1)² +
+ 25·(14−12.1)²] / 100 =
= 1.39

Var(T) =
= [40·(111−111.6)² +
+ 35·(112−111.6)² +
+ 25·(112−111.6)²] / 100 =
= 0.24

Correlation coefficient betweenS and T is
R(S,T) = 0.44 / √1.39·0.24 ≅
≅ 0.76

Obviously, dependency between S and T is no longer linear, which caused thecorrelation to be smaller than 1.

## Monday, July 18, 2016

### Unizor - Probability - Correlation - Problems 6

Notes to a video lecture on http://www.unizor.com

Random Variables
Problems 6 (Correlation)

As always, try to solve any problems presented on this Web site just by yourself and check against the answers provided.
Only then study the suggested solutions.

Problem 6.1.
Consider two random variables,ξ and η, not necessarily independent, each taking two different values as follows:
P{ξ=x1} = 1/2
P{ξ=x2} = 1/2
P{η=y1} = 1/2
P{η=y2} = 1/2

Assume that
P{ξ=x1 & η=y1} = r

Analyze the domain and the range of correlation coefficient R(ξ,η) of these random variables.

Hint:
Use the formula of correlation R(ξ,η) in terms of r from the previous lecture ("Problems 5").

Domain:
r should be in the interval
[0,1/2]
Range: [−1,1]

Problem 6.2.
Consider two random variables,ξ and η, not necessarily independent, each taking two different values as follows:
P{ξ=x1} = 1/2
P{ξ=x2} = 1/2
P{η=y1} = 2/3
P{η=y2} = 1/3

Assume that
P{ξ=x1 & η=y1} = r

Analyze the domain and the range of correlation coefficient R(ξ,η) of these random variables.

Domain:
r should be in the interval
[1/6,1/2]
Range: [−√2/2,√2/2]

Problem 6.3.
Consider the same two random variables, ξ and η, not necessarily independent, each taking two different values as follows:
P{ξ=x1} = p
P{ξ=x2} = 1−p
P{η=y1} = q
P{η=y2} = 1−q

Assume that
P{ξ=x1 & η=y1} = r
and, for definitiveness, p is not smaller than 1/2 and is not greater than q,
that is
1/2 ≤ p ≤ q
(we can always choose x1 and y1 as values with greater or equal probability than, correspondingly, x2 and y2)

Analyze the domain and the range of correlation coefficient R(ξ,η) of these random variables.

Domain:
r should be in the interval
[p+q−1,p]
Range:
from [(1−p)(1−q)]/[pq]
to [p(1−q)]/[q(1−p)]

## Friday, July 15, 2016

### Unizor - Statistics - Correlation - Intoduction

Notes to a video lecture on http://www.unizor.com

Introduction to
Statistical Correlation

Foundation of statistical correlation (as of any other statistical subject) lies in Theory of Probabilities. Please refer to lectures on correlations in the "Probability - Random Variables - Correlation".

Theory of Probabilities introduces the concept of correlation between two random variables as a measure of dependency between them.
In particular, the correlation coefficient for two random variables is a number between −1 and +1, that is equal to zero for independent random variables and equals to +1 or −1 for linearly dependent random variables (as in case of η=A·ξ).

Statistical correlation is a methodology to evaluate the dependency between two random variables, represented by their statistical data, in the absence of information about their distribution of probabilities.

Recall the definition of a correlation coefficient between two random variables ξ and η:
R(ξ,η) =
Cov(ξ,η)/Var(ξ)·Var(η)

where covariance between these random variables is defined as
Cov(ξ,η) =
E[(ξ−E(ξ))·(η−E(η))] =
E(ξ·η)−E(ξ)E(η)

As we see, correlation coefficient is expressed in terms of expectation and variance of each random variable and expectation of their product.

Consider we have statistical data of mutual distribution of two random variables, ξ and η. It is extremely important that our data represent mutual distribution, which means that we know the simultaneous values of both under the same external conditions (like at the same time or at the same temperature etc.)
Then we can calculate not only their separate expectations E(ξ),E(η) and variances Var(ξ),Var(η), but also the expectation of their product E(ξ·η).

Assume we have statistical data on these random variables obtained in the course of Nmutual (that is, under the same condition, like at the same time or under the same pressure etc.) observations of them both. For example, we register the closing price of IBM corporation on New York Stock Exchange (random variable ξ) and Nasdaq 100 index (random variable η) at N sequential days:
ξ:(X1, X2,...XN)
η:(Y1, Y2,...YN)

Based on this information, we can calculate sample mean and variance of each of them as well as sample mean of their product:
E(ξ)=(ΣXk)/N
Var(ξ)={Σ[XkE(ξ)]²}/(N−1)
E(η)=(ΣYk)/N
Var(η)={Σ[YkE(η)]²}/(N−1)
E(ξ·η)=(ΣXk·Yk)/N

Using the results of the above calculations, we can evaluate covariance and correlation coefficient of our two random variables.

The reasonable question arises now, how, based on the correlation coefficient, can we make a judgment about dependency between our two random variables and what is the reason for this dependency.

As we know, independent random variables have a correlation coefficient of zero. It does not mean that, if a correlation is zero, our random variables are necessarily independent. However, it is considered reasonable to assume that sample correlation that is close to zero (traditionally, between −0.1 and +0.1) indicates no observable dependency between random variables.

On the other end of a spectrum we know that linearly connected random variables (η=A·ξ) have correlation coefficient +1 (for positive factor A) or −1 (for negative factor A). It does not mean that, if a correlation is +1 or −1, our random variables are necessarily linearly connected. However, it is considered reasonable to assume that sample correlation that is close to 1 by absolute value (traditionally, in interval of [−1,−0.9] or in [0.9,1]) indicates very strong dependency between random variables.

Not very small and not very large correlation coefficients are interpreted subjectively and differently in different practical situations. Some textbooks recommend to qualify correlation in excess of 0.7 by absolute value as "strong", from 0.4 to 0.7 as "moderate" and from 0.1 to 0.4 as "weak" with correlation less than 0.1 by absolute value as "absent correlation".

A couple of words about causality and correlation. These are two completely different concepts, but not unrelated under certain circumstances. Generally speaking, we can hypothesize that, if two random variables are strongly correlated, one of them might be a cause for another. But not necessarily! First of all, which one of two is the cause and another - a consequence? Secondly, they both might be consequences of some other variable that is a cause for both.

If we have to choose among two strongly correlated random variables which is the cause, the timing can help - the one observed earlier might be a cause (though, not necessarily), while another, observed later, cannot be a cause even theoretically.

## Tuesday, July 12, 2016

### Unizor - Random Variables - Problems 5

Notes to a video lecture on http://www.unizor.com

Random Variables
Problems 5 (Correlation)

As always, try to solve any problems presented on this Web site just by yourself and check against the answers provided.
Only then study the suggested solutions.

Problem 5.1.
Consider two random variables,ξ and η, not necessarily independent, each taking two different values as follows:
P{ξ=x1} = p
P{ξ=x2} = 1−p
P{η=y1} = q
P{η=y2} = 1−q

Assume that
P{ξ=x1 & η=y1} = r

What is the covariance of these random variables?

(x1−x2)·(y1−y2)·(r−pq)

Problem 5.2.
For the same variables as in a previous problem calculate theircorrelation coefficient

K(r−pq)/p(1−p)q(1−q)
where K equals to +1 or −1 depending on a sign of an expression
(x1−x2)·(y1−y2)
that participates in a covariance of these random variables.

## Monday, July 11, 2016

### Unizor - Random Variables - Problems 4

Notes to a video lecture on http://www.unizor.com

Random Variables
Problems 4

As always, try to solve any problems presented on this Web site just by yourself and check against the answers provided.
Only then study the suggested solutions.

Problem 4.1.
Consider a random variable ξand η taking two different values as follows:
P{ξ=x1} = p
P{ξ=x2} = 1−p

What is its mathematical expectation and variance?

E(ξ)=x1·p+x2·(1−p)
Var(ξ)=(x1−x2)²·p·(1−p)

Problem 4.2.
Consider two random variables,ξ and η, not necessarily independent, each taking two different values as follows:
P{ξ=x1} = p
P{ξ=x2} = 1−p
P{η=y1} = q
P{η=y2} = 1−q

Assume that
P{ξ=x1 & η=y1} = r

What are the probabilities of all other combinations of values for these random variables, namely:
P{ξ=x1 & η=y2} = ?
P{ξ=x2 & η=y1} = ?
P{ξ=x2 & η=y2} = ?

P{ξ=x1 & η=y2} = p−r
P{ξ=x2 & η=y1} = q−r
P{ξ=x2 & η=y2} = 1+r−p−q

Problem 4.3.
Consider the same two random variables, ξ and η, as described in the previous problem.
What is the mathematical expectation of their product?

E(ξ·η) =
= x1·y1·r +
+ x1·y2·(p−r) +
+ x2·y1·(q−r) +
+ x2·y2·(1+r−p−q)

## Friday, July 8, 2016

### Unizor - Probability - Correlation

Notes to a video lecture on http://www.unizor.com

Random Variables
Correlation

In this lecture we will talk about independent and dependent random variables and will introduce a numerical measure of dependency between random variables.

Assume a random variable ξtakes values
x1, x2,..., xM
with probabilities
p1, p2,..., pM.
Further, assume a random variable η takes values
y1, y2,..., yN
with probabilities
q1, q2,..., qN.

A short reminder about independence of these random variables.
If the conditional probability of a random variable ξ taking any one of its values under the condition that a random variable η took any one of its values equals to an unconditional probability of ξtaking that value, then a random variable ξ is independent of a random variable η.
In other words, ξ is independent of η if
P(ξ=xi | η=yj ) = P(ξ=xi )
where index i can take any value from 1 to M and index can take values from 1 to N.

Simple consequences of this definition, as discussed in previous lectures are:
(a) independence is symmetrical, that is, if ξ is independent of η, then η is independent of ξ:
IF P(ξ=xi|η=yj)=P(ξ=xi)
THEN P(η=yj|ξ=xi)=P(η=yj)
(b) for independent random variables the probability of them to take simultaneously some values equals to a product of their probabilities to take these values independently:
P(ξ=xi ∩ η=yj ) =
P(ξ=xi ) · P(η=yj )

(c) Mathematical expectation of a product of two independent random variables equal to a product of their mathematical expectations:
E(ξ·η) = E(ξ)·E(η)

The last property of mathematical expectations for independent random variables is the basis of measuring the degree of dependency between any pair of random variables.

First of all, we introduce a concept of covariance of any two random variables:
Cov(ξ,η) =
E[(ξ−E(ξ))·(η−E(η))]

Simple transformation by opening parenthesis converts it into an equivalent definition:
Cov(ξ,η) = E(ξ·η)−E(ξ)E(η)

Now we see that for independent random variables their covariance equals to zero (see property (c) above).

Incidentally, the covariance of a random variable with itself (kind of ultimate dependency) equal to its variance:
Cov(ξ,ξ) = E(ξ·ξ)−E(ξ)E(ξ) =
E[(ξ−E(ξ))²] = Var(ξ)

Also notice that another example of very strong dependency, η = A·ξ, where is a constant, leads to the following value of covariance:
Cov(ξ,Aξ) =
E(ξ·Aξ)−E(ξ)E(Aξ)
=
= A·E[(ξ−E(ξ))²] = A·Var(ξ)
This shows that, when coefficient A is positive (that is, positive change of ξ causes positive change of η=A·ξ),covariance between them is positive as well and proportional to coefficient A. If A is negative (that is, positive change of ξ causes negative change of η=A·ξ), covariance between them is negative as well and still proportional to coefficient A.

One more example.
Consider "half-dependency" between ξ and η, defined as follows.
Let ξ' be an independent random variable, identically distributed with ξ.
Let η = (ξ + ξ')/2.
So, η "borrows" its randomness from two independent identically distributed random variables ξ and ξ'.
Then covariance between ξ andη is:
Cov(ξ,η) = Cov(ξ,(ξ+ξ')/2) =
=E[ξ·(ξ+ξ')/2)]−
E(ξ)·E((ξ+ξ')/2) =

=E(ξ²)/2+E(ξ·ξ')/2 −
−[E(ξ)]²/2−E(ξ)·E(ξ')/2

Since ξ and ξ' are independent, expectation of their product equals to a product of their expectations.
So, our expression can be transformed further:=E(ξ²)/2+E(ξ)·E(ξ')/2 −
−[E(ξ)]²/2−E(ξ)·E(ξ')/2
=
Var(ξ)/2
As we see, covariance between "half-dependent" random variables ξ and η=(ξ+ξ')/2, where ξ and ξ' are independent identically distributed random variables, equals to half of the variance of ξ.

All the above manipulations with covariance led us to some formulas where the variance plays a significant role. If we want a kind of measure that reflects the dependency between random variables not related to variances, but always scaled in the interval [−1, 1], we have to scale the covariance by a factor that depends on variances, thus forming a coefficient of correlation:R(ξ,η) =
Cov(ξ,η)/(Var(ξ)·Var(η)

Let's examine this coefficient of correlation in cases we considered above as examples.

For independent random variables ξ and η the correlation is zero because their covariance is zero.

Correlation between a random variable and itself equals to 1:R(ξ,ξ) = Cov(ξ,ξ)/Var(ξ,ξ) = 1

Correlation between a random variables ξ and  equals to 1(for positive constant A) or −1(for negative A):
R(ξ,Aξ) = Cov(ξ,Aξ)/(Var(ξ)·Var(Aξ) A/|A|
which equals to 1 or −1, depending on a sign of A.
This seems to corresponds our intuitive understanding of rigid relationship between ξ and .

Correlation between "half-dependent" random variables, as introduced above, is:
R(ξ,(ξ+ξ')/2) = Cov(ξ,(ξ+ξ')/2)/(Var(ξ)·Var((ξ+ξ')/2) 2/2.

As we see, in all these examples the correlation is a number from an interval [−1,1] that is equal to zero for independent random variables, equals to 1 or−1 for rigidly dependent random variables and is inside this interval for partially dependent (like in our example of "half-dependent") random variables.

For those interested, it can be proved that this statement is true for any pair of random variables.
So, the coefficient of correlation is a good tool to measure the degree of dependency between two random variables.

## Tuesday, July 5, 2016

### Unizor - Probability - Expectation of Product

Notes to a video lecture on http://www.unizor.com

Independent Random Variables
Expectation of Product

Our goal in this lecture is to prove that expectation of a product of two independent random variables equals to a product of their expectations.

First of all, intuitively, this fact should be obvious, at least, in some cases.
When an expectation of arandom variable is a value, around which results of random experiments are concentrated (like a temperature of a healthy person), product of results of two different experiments (product of temperatures of two different healthy persons) tend to concentrate around product of their expectations.
In some other cases, when such a concentration does not take place (like flipping a coin), that same rule of multiplicative property of an expectation is still observed.

A very important detail, however, differentiates property of a sum of two random variables from their product. The expectation of a sum always equals to a sum of expectations of its component. With a product the analogous property is true only in case the components areINDEPENDENT random variables.

Let's approach this problem more formally and prove this theorem.

Consider the following tworandom experiments (sample spaces) and random variablesdefined on their elementary events.

Ω1=(e1,e2,...,eM )
with corresponding measure of probabilities of theseelementary events
P=(p1,p2,...,pM )
(that is, P(ei )=pi - non-negative numbers with their sum equaled to 1)
and random variable ξ defined for each elementary event as
ξ(ei) = xi where i=1,2,...M

Ω2=(f1,f2,...,fN )
with corresponding measure of probabilities of theseelementary events
Q=(q1,q2,...,qN )
(that is, Q(fj )=qj - non-negative numbers with their sum equaled to 1)
and random variable η defined for each elementary event as
η(fj) = yj where j=1,2,...N

Separately, the expectations of these random variables are:
E(ξ) = x1·p1+x2·p2+...+xM·pM
E(η) = y1·q1+y2·q2+...+yN·qN

To calculate the expectation of a product of these random variables, let's research what values and with whatprobabilities this product can take.
Since every value of ξ can be observed with every value of η, we can conclude that all the values of their product are described by all values xi·yjwhere index i runs from 1 to Mand index j runs from 1 to N.

Let's examine the probabilistic meaning of a product of tworandom variables defined on two different sample spaces.
Any particular value xi·yj is taken by a new random variableζ=ξ·η defined on a new combined sample spaceΩ=Ω1×Ω2 that consists of all pairs of elementary events(ei , fj ) with the corresponding combined measure ofprobabilities of these pairs equal to
R(ei , fj ) = rij
where index i runs from 1 to Mand index j runs from 1 to N.

Thus, we have defined a newrandom variable ζ=ξ·η defined on a new sample space Ω ofM·N pairs of elementary eventsfrom two old spaces Ω1 and Ω2as follows
ζ(ei , fj ) = xi·yj
with probability ri j

Before going any further, let's examine very important properties of probabilities rij.
We have defined rij as aprobability of a random experiment described by asample space Ω1 resulting inelementary event ei and, simultaneously, a random experiment described by asample space Ω2 resulting inelementary event fj.
Incidentally, if events from these two sample spaces areindependent,
rij = pi·qj
because, for independentevents, probability of their simultaneous occurrence equals to a product of probabilities of their separate individual occurrences.

Keeping in mind the above properties of probabilities rij, we can calculate the expectationof our new random variable ζ.
E(ζ) = E(ξ·η) =
= (x1·y1)·r11+...+(x1·yN)·r1N +
+ (x2·y1)·r21+...+(x2·yN)·r2N +
...
+ (xM·y1)·rM1+...+(xM·yN)·rMN

On the other hand, let's calculate the product of expectations of our random variable ξ and η:

E(ξ)·E(η) =
=(x1·p1+...+xM·pM
·(y1·q1+...+yN·qN) =
= (x1·y1)·p1q1+...+(x1·yN)·p1qN +
+ (x2·y1)·p2q1+...+(x2·yN)·p2qN +
...
+ (xM·y1)·pMq1+...+(xM·yN)·pMqN

Obviously, if random variablesξ and η are INDEPENDENT, probability rij of ξ to take valuexi and, simultaneously, η to take value yj equals to a product of corresponding probabilitiespi·qj. In this case expressions for E(ξ·η) and E(ξ)·E(η) are identical.

That proves that forINDEPENDENT random variables mathematical expectation of their product equals to a product of their mathematical expectations.
End of proof.