## Monday, March 28, 2016

### Unizor

Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on http://www.unizor.com

Statistical Distribution Task A - Problem

The values our random variable ξ takes are discrete and theoretically known, they are X1, X2... XK, but probabilities to take these values - p1, p2... pK - are unknown and we need to evaluate them.

Problem
There are three parties that nominated their candidates for the positions of the President, the Vice President and the Defense Minister - White Party, Blue Party and Red Party.
The winner of the elections will become the President, the one who takes the second place will be the Vice President and the third place candidate will be the Defense Minister.
In order to predict the results of the presidential elections, a survey was organized among 4000 randomly chosen voters. The results of the survey are: 1480 people prefer a candidate from the White Party, 1320 people prefer a candidate from the Blue Party and 1200 people prefer a candidate from the Red Party.
What are the probabilities of winning for all candidates (PW, PB, PR) and what is the 95% certainty margin of error in each case (mW, mB, mR)?

Solution
The probabilities of winning are, obviously, approximated as empirical frequencies - random variables P'W, P'B and P'R with values obtained in a survey:
P'W = 1480/4000 = 0.37
P'B = 1320/1000 = 0.33
P'R = 1200/1000 = 0.30

As for a margin of error, let's calculate it in two ways:
(a) simple and crude, using the rule 2σ ≤ 1/√N;
(b) using sample variance.

Method (a)
Var = PW(1−PW)/4000
(where PW is unknown).
Crude evaluation of this is based on inequality
p(1−p) ≤ 1/4 = 0.25
for all p from 0 to 1 (the range of all probabilities).
Therefore, for method (a) of evaluation of margin of error we can use:
σ² ≤ 1/(4·4000);
σ ≤ 1/(2·√4000)
For 95% certainty we need 2σ interval:
2σ ≤ 1/√4000 ≅ 0.0158

As we see, the crude and simple method (a) of evaluating the margin of error gives
2σ ≤ 0.0158.
Therefore,
mW ≤ 0.0158
mB ≤ 0.0158
mR ≤ 0.0158

Here is how the probabilities fall into intervals with this margin of error:
PW ∈ [0.3542, 0.3858]
PB ∈ [0.3142, 0.3458]
PR ∈ [0.2842, 0.3158]

The White Party candidate has, with 95% certainty, more chances to become President. The differentiation between Blue and Red is not sufficient to state that they are different with 95% certainty, since intervals intersect, as we see.

Provided the same proportion of opinions (0.37, 0.33 and 0.30), we need (in a crude evaluation case) to make the number of experiments N large enough to have 1/√N smaller than half a distance between values. The smallest distance is 0.33−0.30=0.03, so we have to satisfy the inequality:
1/√N ≤ 0.03/2
N ≥ 4444
It means that, for being 95% certain in our prediction for all three participants, we need at least 4444 participants if the smallest difference between empirical frequencies is 0.03.
Generally, for 95% certainty, if the smallest difference between empirical frequencies is d, we need
1/√N ≤ d/2
N ≥ 4/d²

Method (b)
This method is based on evaluating the variance using the sample data.

Sample variance of random variable β introduced above is
VarW = [1480(1−0.37)² +
+ 2520(0−0.37)²] /3999 ≅
≅ 0.2331
This is only a little better then crude evaluation based on the maximum of the variance 0.25.

Sample variance of average of 4000 independent random variables identically distributed as β will be
Var(P'W) ≅ 0.000058275
The 2σ in this case is
2σ(P'W) ≅ 0.0153
The interval the corresponding probability falls into with 95% certainty is:
PW ∈ [0.3547,0.3853]

Analogous calculations for PB lead to the following results:
VarB = [1320(1−0.33)² +
+ 2680(0−0.33)²]/3999 ≅
≅ 0.2212
Therefore, sample variance of average of 4000 of independent random variables identically distributed as β will be
Var(P'B) ≅ 0.000052888
The 2σ in this case is
2σ(P'B) ≅ 0.0149
The interval the corresponding probability falls into with 95% certainty is:
PB ∈ [0.3151,0.3449]
We see clearly that Blue Party candidate with 95% certainty has less chances to become President.

Similarly, calculations for PR lead to the following results:
VarR = [1200(1−0.30)² +
+ 2800(0−0.30)²]/3999 ≅
≅ 0.2100
Therefore, sample variance of average of 4000 of independent random variables identically distributed as β will be
Var(P'R) ≅ 0.0000525
The 2σ in this case is
2σ(P'R) ≅ 0.0145
The interval the corresponding probability falls into with 95% certainty is:
PR ∈ [0.2855,0.3145]

As you see, with more precise (albeit, with a small degree of uncertainty) estimation of variance, there is a clear distinction between Blue and Red parties.
With 95% certainty Red Party candidate has less chances to become a Vice President.