*Notes to a video lecture on http://www.unizor.com*

__Statistical Correlation -__

Problems 1

Problems 1

Assume two different experiments with numerical results performed under the same conditions.

Statistical results of these experiments are used to determine if there is a dependency between them.

For example, we measure the amount of salt added to the cold water (random variable

*S*) and the time it takes to boil it (random variable

*T*).

The results of numerous experiments are in a table, where each row corresponds to an observed value of the first experiment (S), each column corresponds to an observed value of the second experiment (

*T*) and on a crossing of a row and a column there is a number of times when corresponding results of the first and the second experiments occurred.

Using these results, calculate the

*correlation coefficient*between the random variables representing these two experiments.

*Problem A*

Values of random variables

*S*and

*T*are in a table below.

T=101 | T=102 | T=104 | |

S=1 | 40 | 0 | 0 |

S=2 | 0 | 35 | 0 |

S=4 | 0 | 0 | 25 |

*correlation coefficient*

*of these two random variables.*

**R**(S,T)*Solution*

To calculate

*correlation coefficient*

*we need to calculate the following mathematical expectations*

**R**(S,T)*and variances*

**E**()*:*

**Var**()*- mean value of*

**E**(S)*S*

*- mean value of*

**E**(T)*T*

*- mean value of*

**E**(S·T)*S·T*

*- variance of*

**Var**(S)*S*

*- variance of*

**Var**(T)*T*

and put them into a formula for correlation

*:*

**R**(S,T)

=

**R**(S,T) ==

**Cov**(S,T)**/**√**Var**(S)·**Var**(T)where

**Cov**(S,T) =**E**(S·T)−**E**(S)·**E**(S)*S=1*in

*40+0+0=40*cases,

*S=2*in

*0+35+0=35*cases,

*S=4*in

*0+0+25=25*cases,

Total number of observations

*N = 40+35+25 = 100*

Therefore, mean

= (1·40+2·35+4·25)/100 = 2.1

**E**(S) == (1·40+2·35+4·25)/100 = 2.1

*T=101*in

*40+0+0=40*cases,

*T=102*in

*0+35+0=35*cases,

*T=104*in

*0+0+25=25*cases,

Total number of observations is still the same

*N = 40+35+25 = 100*

Therefore, mean

=(101·40+102·35+104·25)/100

= 102.1

**E**(T) ==(101·40+102·35+104·25)/100

= 102.1

There are only three possible combinations of simultaneous values of

*S*and

*T*:

*S·T=1·101=101*in

*40+0+0=40*cases,

*S·T=2·102=204*in

*0+35+0=35*cases,

*S·T=4·104=416*in

*0+0+25=25*cases,

Total number of observation is still the same

*N = 40+35+25 = 100*

Therefore, mean

=(101·40+204·35+416·25)/100

= 215.8

**E**(S·T) ==(101·40+204·35+416·25)/100

= 215.8

Now we can calculate

*covariance*between

*S*and

*T*:

= 215.8−2.1·102.1 = 1.39

**Cov**(S,T) == 215.8−2.1·102.1 = 1.39

Next is the calculation of variances of

*and*

This result might have been predicted since, obviously, within the framework of our experiments there is a linear dependency between

As we know, there is a limit of salt that can be dissolved in water. As we add salt, the concentration of it in water can reach its maximum and new salt is no longer dissolved, it called saturation.

Then the temperature of boiling will no longer increase since concentration of salt will remain the same.

Assume, we make three experiments, as in a problem above, but after the second experiment the water has reached a point of saturation.

Values of random variables

Calculate

Total number of observations

Therefore, mean

Total number of observations is still the same

Therefore, mean

There are only three possible combinations of simultaneous values of

Total number of observation is still the same

Therefore, mean

Now we can calculate

Next is the calculation of variances of

*T*.

= [40·(1−2.1)² +

+ 35·(2−2.1)² +

+ 25·(4−2.1)²] / 100 =

= 1.39**Var**(S) == [40·(1−2.1)² +

+ 35·(2−2.1)² +

+ 25·(4−2.1)²] / 100 =

= 1.39

= [40·(101−102.1)² +

+ 35·(102−102.1)² +

+ 25·(104−102.1)²] / 100 =

= 1.39**Var**(T) == [40·(101−102.1)² +

+ 35·(102−102.1)² +

+ 25·(104−102.1)²] / 100 =

= 1.39

*Correlation coefficient*between*S*and*T*is**R**(S,T) = 1.39**/**√1.39·1.39 = 1This result might have been predicted since, obviously, within the framework of our experiments there is a linear dependency between

*S*and*T*:*T = 100 + S**Problem B*As we know, there is a limit of salt that can be dissolved in water. As we add salt, the concentration of it in water can reach its maximum and new salt is no longer dissolved, it called saturation.

Then the temperature of boiling will no longer increase since concentration of salt will remain the same.

Assume, we make three experiments, as in a problem above, but after the second experiment the water has reached a point of saturation.

Values of random variables

*S*and*T*are in a table below.T=111 | T=112 | T=112 | |

S=11 | 40 | 0 | 0 |

S=12 | 0 | 35 | 0 |

S=14 | 0 | 0 | 25 |

*correlation coefficient**of these two random variables.***R**(S,T)*Solution**S=11*in*40+0+0=40*cases,*S=12*in*0+35+0=35*cases,*S=14*in*0+0+25=25*cases,Total number of observations

*N = 40+35+25 = 100*Therefore, mean

= (11·40+12·35+14·25)/100 =

= 12.1**E**(S) == (11·40+12·35+14·25)/100 =

= 12.1

*T=111*in*40+0+0=40*cases,*T=112*in*0+35+0=35*cases,*T=112*in*0+0+25=25*cases,Total number of observations is still the same

*N = 40+35+25 = 100*Therefore, mean

=(111·40+112·35+112·25)/100

=(111·40+112·60)/100 =

= 111.6**E**(T) ==(111·40+112·35+112·25)/100

=(111·40+112·60)/100 =

= 111.6

There are only three possible combinations of simultaneous values of

*S*and*T*:*S·T=11·111*in*40+0+0=40*cases,*S·T=12·112*in*0+35+0=35*cases,*S·T=14·112*in*0+0+25=25*cases,Total number of observation is still the same

*N = 40+35+25 = 100*Therefore, mean

= (11·111·40 +

+ 12·112·35 +

+ 14·112·25)/100 =

= 1350.8**E**(S·T) == (11·111·40 +

+ 12·112·35 +

+ 14·112·25)/100 =

= 1350.8

Now we can calculate

*covariance*between*S*and*T*:

= 1350.8−12.1·111.6 = 0.44**Cov**(S,T) == 1350.8−12.1·111.6 = 0.44

Next is the calculation of variances of

*and*

Obviously, dependency between*T*.

= [40·(11−12.1)² +

+ 35·(12−12.1)² +

+ 25·(14−12.1)²] / 100 =

= 1.39**Var**(S) == [40·(11−12.1)² +

+ 35·(12−12.1)² +

+ 25·(14−12.1)²] / 100 =

= 1.39

= [40·(111−111.6)² +

+ 35·(112−111.6)² +

+ 25·(112−111.6)²] / 100 =

= 0.24**Var**(T) == [40·(111−111.6)² +

+ 35·(112−111.6)² +

+ 25·(112−111.6)²] / 100 =

= 0.24

*Correlation coefficient*between*S*and*T*is

≅ 0.76**R**(S,T) = 0.44**/**√1.39·0.24 ≅≅ 0.76

Obviously, dependency between

*S*and*T*is no longer linear, which caused the*correlation*to be smaller than 1.
## No comments:

Post a Comment