Friday, February 19, 2016

Unizor - Statistical Distribution





Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on http://www.unizor.com

Statistical Distribution

In this lecture we will attempt to analyze the real probabilistic laws governing the behavior of some random variable by observing the results of random experiments with it.
The task at hand is to find out for some random variable the probabilities it takes certain values, if we know what values it already took in the past.

Let's separate our task into the following subtasks:

Task A. The values our random variable ξ takes are discrete and theoretically known, they are X1, X2... XK, but probabilities to take these values are unknown and we need to evaluate them.
Examples: rolling a standards cubical dice (6 known outcomes), Bernoulli random variable (only two values - 1 and 0).

Task B. The values our random variable ξ takes are discrete and unknown, so we have to determine with some certainty both the values it might take and its probabilities to take these values.
Examples: number of people injured in auto accidents during some randomly chosen month, amount of money some randomly chosen family in the United States spends on entertainment during a year.

Task C. Our random variable ξ takes continuous range of values with theoretically known fixed boundaries - from a to b.
Examples: temperature of the water in a pot (from freezing to boiling), direction of the wind at a specific location (from 0o to 360o relatively to the North).

Task D. Our random variable ξ takes continuous range of values with unknown one or both boundaries.
Examples: weight of a Buddha statue randomly chosen among all statues in Bangkok (no reasonable theoretical upper boundary), distance between two randomly chosen European cities (no reasonable theoretical upper boundary, except the size of Europe).

All tasks are approached similarly. We always start from some N random experiments with our variable ξ and register the results of these experiments - the values it took.

Now we have to choose different paths to solve the four tasks mentioned above.

Task A
Assume that, as the result of N experiments, random variable ξ that could theoretically take values X1, X2... XK, took value Xi in ni experiments, where i∈[1,K].
Obviously, n1+n2+...+nK=N.
Our best approximation for the probability of ξ to take value Xi is ni / N - the empirical frequency of occurrence of this event.
So, our statistical distribution of random variable ξ looks like
Prob{ξ=Xi} ≅ ni / N
The quality of this approximation is better with larger number of experiments and will be discussed separately.

Task B
Assume that, as the result of N experiments, random variable ξ, which theoretically takes some unknown values with unknown probabilities, took values x1, x2... xM, correspondingly, m1, m2... mM times (sum of all mi equals to N).
We can reduce this task to the one similar to a previous case by following these steps:
1. Choose the minimum xmin and maximum xmax values among empirical results xi.
2. Divide the range from xmin to xmax into K equal intervals:
Δ1: [xmin=y0,y1],
Δ2: [y1,y2],
...
ΔK: [yK-1,yK=xmax]
3. For each interval Δi between xmin and xmax calculate the number of times our random variable ξ took a value within this interval. Assume, it's ni.
4. Use an empirical frequency of the value of random variable ξ to fall within each interval as a statistical distribution:
Prob{ξ∈Δi} ≅ ni / N
Try to avoid cases with small number of experiments (say, less than 100) since the evaluation of probabilities in these cases will be far from precise.

Task C
Here we are dealing with a continuously distributed random variable with values theoretically in the range from a to b.
We do have N empirical values this random variable took in a series of experiments.
The approach we can suggest is similar to the previous on.
We divide the range from a to b into K equal parts and count how many times the values fell into each part, thus getting the discrete distribution that approximates the real continuous distribution of our random variable.

Task D
If we have no theoretical knowledge about the range of values our random variable might take, we have no other choice but artificially assign its minimum and maximum empirical values to lower and upper boundaries (maybe, with some rounding).
When these boundaries are established, we continue the same way as in a previous task - divide the range between the boundaries into intervals and count the number of times the values fell into each interval to get empirical frequencies.

Tuesday, February 16, 2016

Unizor - Bernoulli Statistics - New Solution





Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on http://www.unizor.com

Bernoulli Statistics -
New Solutions to Old Problems

In this lecture we will apply the sample variance instead of the upper bound of the real variance to obtain more precise, albeit slightly less certain, evaluation of statistical parameters.

Problem 1

A quality control at some parts manufacturer has determined that out of 10,000 sampled parts made by this manufacturer 300 were defective.
Determine the probability of manufacturing the defective part and a margin of error with the level of certainty equal to 0.9545.

Solution

Consider Bernoulli random variable ξ that takes the value 1 with an unknown probability P if a part is defective and takes the value 0 otherwise.
As we know, mathematical expectation and variance of this random variable are:
E(ξ) = P
Var(ξ) = P·(1−P)

Then the random variable representing a frequency of defective parts is expressed as
η = (ξ1+ξ2+...+ξN) / N
Here N=10,000, all ξi are independent random variables identically distributed as ξ and a single value of random variable η is 300/10000=0.03.

We assume that the distribution of η is close to Normal with mathematical expectation and variance, expressed in terms of unknown probability P as
E(η) = N·E(ξ) / N = P
Var(η) = σ² = N·Var(ξ) / N² = P(1−P) / N

Since the unknown probability P equals to mathematical expectation of η and a single value of η is an unbiased approximation to this expectation, we can say that the probability P is, approximately, equal to 0.03.

To determine the margin of error, recall that for a Normal random variable with certainty level of 0.9545 its values are within an interval of 2σ from its mathematical expectation, where σ is a standard deviation.

Instead of using the upper bound of the standard deviation, as was suggested in the first lecture that presented this problem, that is σ is not greater than 1/(2√N) = 0.005,
we will use a sample variance calculated based on obtained results from our 10,000 experiments that produced 300 defective details.
The sample mean is
m = 300/10000 = 0.03
The sample variance is
s² = [300·(1−0.03)² + 9700·(0−0.03)²] / 9999 ≅ 0.0291
The sample standard deviation (square root of s²) is, therefore, 0.17. This is a better (smaller) evaluation of the standard deviation of our random variable ξ than its upper bound 1/4.

Based on this more precise evaluation, standard deviation of η is 0.17/100 = 0.0017.
The 2σ rule, therefore, says that with certainty level 0.9545 the real probability of manufacturing a defective part is from 0.03−0.0034=0.0266 to 0.03+0.0034=0.0334.
Symbolically, it looks like this:
Prob{0.0266 ≤ P ≤ 0.0334} ≅ 0.9545

This is a narrower interval than [0.02;0.04] that we have obtained using the upper bound evaluation of standard deviation - a simpler but rather crude method.

So, our evaluation of probability P is more precise when using sample variance. However, certain element of uncertainty (relatively small for large samples) was introduced when we approximated real variance with its sample-based value.

Problem 2

A quality control at some parts manufacturer has determined that out of 10,000 sampled parts made by this manufacturer 300 were defective.
What certainty level can we attribute to the following (narrower than in a previous problem) evaluation of probability P of manufacturing a defective part:
P∈[0.0283;0.0317]

Solution

Notice that in this case we are talking about a margin of error 0.0017 around the empirical sample average 0.03, that we can use as an unbiased evaluation of probability P. This margin of error equals to a sample variance - a relatively good approximation of a standard deviation of our random variable η. Therefore, as we know, the probability of a normal random variable to be in the vicinity of σ from its mathematical expectation equals to 0.6825.
Therefore, the level of certainty for this evaluation is:
Prob{P∈[0.0283;0.0317]} ≅ 0.6825
As you see, more precise evaluation can be made with less certainty.

Problem 3

Now our purpose is to determine the volume N of the sample set of parts required to evaluate the probability of manufacturing a defective part within a margin of error Δ=0.001 with certainty level p=0.9545.

Crude evaluation of this standard deviation σ based on its upper bound 1/(2√N) gives us the value of N as a solution to an equation
1/(2√N) = 0.0005
which is N=1000²=1,000,000.

We would like to use a more precise sample variance evaluation instead of a crude upper bound of it to reduce the required number of experiments, but the problem is - we don't have a sample yet, we want to evaluate its volume before we do real experiments.
Watch the video for a suggested solution .

Tuesday, February 9, 2016

Unizor - Bernoulli Statistics - Sample Variance





Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on http://www.unizor.com

Bernoulli Statistics - Sample Variance

The most frequently used procedure to better evaluate the variance of η is to use sample variance that can be calculated based on existing values from N experiments x1, x2...xN.

The variance of a discrete random variable is a probability-weighted average of squares of its deviation from its mathematical expectation.
Inasmuch as we have accepted a sample average of values our random variable ξ took in N experiments
m = (x1+x2+...+xN) / N
as a substitution for E(ξ)=P, it seems reasonable to accept a sample average of squares of deviations from its sample mean
sN² = [(x1−m)²+(x2−m)²+...+(xN−m)²] / N
as a substitution for Var(ξ)=P(1−P).

Analogously to analyzing the bias and margin of error of the sample mean m as an estimate of E(ξ), we need to analyze two issues with this substitution sN² for Var(ξ) - its bias and its margin of error.

An estimate m of probability P is unbiased, and that is a very desirable property of any estimate.
Considering m as a single value of a random variable
η = (ξ1+ξ2+...+ξN) / N
and calculating the mathematical expectation of η, we had E(η)=P, which confirms that m is an unbiased estimate of E(ξ)=P.

It would be very much desirable for our estimate sN² of Var(ξ) to be unbiased as well.
To determine this, we have to consider sN² as a single value of a random variable
ζ = [(ξ1−η)²+(ξ2−η)²+...+(ξN−η)²] / N
and check if its mathematical expectation E(ζ) equals to Var(ξ)=P(1−P).
If it is, our estimate sN² would be an unbiased estimate of Var(ξ) and, using sN² instead of an upper bound 1/4 for variance Var(ξ), we might more precisely evaluate the quality of using sample mean m as an estimate of an unknown probability P.

Let's do the calculations, keeping in mind that all ξi are independent random variables identically distributed as ξ and η is their arithmetic average.
E(ζ) = E{[(ξ1−η)²+(ξ2−η)²+...+(ξN−η)²]/N}=E{[Nξ1−(ξ1+...+ξN)]²}/N²

Let's separately evaluate the expectation E{...} in the above expression without a factor N² in the denominator:
E{[Nξ1−(ξ1+...+ξN)]²} =
N²·E{ξ1²}−2N·E{ξ1·(ξ1+...+ξN)}+E{(ξ1+...+ξN)²}

Consider each expectation in the above expression separately:
E{ξ1²} = 1²·P+0²·(1−P) = P

Taking into consideration that all ξi are independent and identically distributed, we can use the property of mathematical expectation of a product of two independent variables to be equal to a product of their expectation:
E{ξ1·(ξ1+...+ξN)} = P + (N−1)P²

In the last component, if we square the sum of N variables, we will have N² components added up, N of them being squares of each ξi and the rest N²−N components being products of mixed indexed independent variables.
Therefore,
E{(ξ1+...+ξN)²}=N·P·[(1+(N−1)P]

Combining all the results together to calculate the numerator of E(ζ) (recall that denominator is N²), we get:
N²·E(ζ) = N²·P − 2N[P+(N−1)P²] + N·P·[(1+(N−1)P] =
= N·P·(N−1−N·P+P) = N·(N−1)·P·(1−P)
Hence,
E(ζ) = P·(1−P)·(N−1)/N

As we see, the mathematical expectation of
ζ = [(ξ1−η)²+(ξ2−η)²+...+(ξN−η)²] / N
is not exactly the same as Var(ξ)=P·(1−P).
Granted, for large N the difference is small and tends to zero as N→∞, but still the evaluation is biased, that is not centered on a value we want to evaluate.

An easy solution to this problem is, instead of ζ, use
θ = ζ·N/(N−1)
That is,
θ = [(ξ1−η)²+(ξ2−η)²+...+(ξN−η)²] / (N−1)
In this case E(θ) will be exactly as Var(ξ):
E(θ) = E(ζ)·N/(N−1) = P(1−P)

This implies that an unbiased evaluation of an unknown variance of a Bernoulli random variable ξ based on a sample of its N values
x1, x2...xN should be
s² = [(x1−m)²+(x2−m)²+......+(xN−m)²] / (N−1)

In the next lecture we will use this evaluation of the Var(ξ) in the problems presented in the previous lecture to obtain more realistic numbers for margin of error and certainty level.

The so far open is a question of how good our evaluation of Var(ξ) really is. After all, if it's not a good evaluation, that is, if deviation from its mean, that is, Var(θ), is too large, our calculations of margin of error and certainty level are not as precise as we'd like them to be.
The exact calculations of Var(θ), which is a measure of precision of our evaluation of Var(ξ) with sN−1², were done by mathematicians and are known quite well, but they lie outside of the scope of this course. However, it's important to know that Var(θ) tends to zero as N→∞ on the order of 1/N, which seems intuitively correct.

Monday, February 8, 2016

Unizor - Bernoulli Statistics - Problems





Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on http://www.unizor.com

Bernoulli Statistics - Problems

Problem 1

A quality control at some parts manufacturer has determined that out of 10,000 sampled parts made by this manufacturer 300 were defective.
Determine the probability of manufacturing the defective part and a margin of error with the level of certainty equal to 0.9545.

Solution

Consider Bernoulli random variable ξ that takes the value 1 with an unknown probability P if a part is defective and takes the value 0 otherwise.
As we know, mathematical expectation and variance of this random variable are:
E(ξ) = P
Var(ξ) = P·(1−P)

Then the empirical frequency of defective parts is expressed as
η = (ξ1+ξ2+...+ξN) / N
Here N=10,000 and a single value of random variable η is 300/10000=0.03.

We assume that the distribution of η is close to Normal with mathematical expectation and variance, expressed in terms of unknown probability P as
E(η) = N·E(ξ) / N = P
Var(η) = σ² = N·Var(ξ) / N² =
= P(1−P) / N which is not greater than 1 / (4N)

Since the unknown probability P equals to mathematical expectation of η and a single value of η is an unbiased approximation to this expectation, we can say that the probability P is, approximately, equal to 0.03.

To determine the margin of error, recall that for a Normal random variable with certainty level of 0.9545 its values are within an interval of 2σ from its mathematical expectation, where σ is a standard deviation.
In our case, though we don't know σ exactly, we know that it is bounded from above:
σ is not greater than 1/(2√N) = 0.005

Therefore, with certainty level of 0.9545 we can state that an unknown probability P is on a distance (that is, with a margin of error) of no more then
2σ = 1/√N = 0.01
from empirical expectation 0.03, which can be expressed as:
Prob{P∈[0.02;0.04]} is not less than 0.9545

Problem 2

A quality control at some parts manufacturer has determined that out of 10,000 sampled parts made by this manufacturer 300 were defective.
What certainty level can we attribute to the following (narrower than in a previous problem) evaluation of probability P of manufacturing a defective part:
P∈[0.025;0.035]

Solution

Notice that in this case we are talking about a margin of error 0.005 around the empirical sample average 0.03, that we can use as an unbiased evaluation of probability P. This margin of error equals to an upper bound of a standard deviation of our random variable σ. Therefore, as we know, the probability of a normal random variable to be in the vicinity of σ from its mathematical expectation equals to 0.6825.
Therefore, the level of certainty for this evaluation is:
Prob{P∈[0.025;0.035]} is not less than 0.6825
As you see, more precise evaluation can be made with less certainty.

Problem 3

Now our purpose is to determine the volume N of the sample set of parts required to evaluate the probability of manufacturing a defective part within a margin of error Δ=0.005 with certainty level p=0.9545.

Solution
The required certainty level can be assured with evaluation of probability P by its empirical value (presumed, normally distributed) if margin of error Δ is equal to 2σ.
Since σ is not greater than 1/(2√N) and margin of error Δ=2σ=0.005, we can find N:
2/(2√N)=0.005
N = 200² = 40000
So, it takes 40,000 parts to examine to achieve a precision of our evaluation to be within margin of error of 0.005 .

Thursday, February 4, 2016

Unizor - Bernoulli Statistics - Solution





Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on http://www.unizor.com

Bernoulli Statistics - Solution

Obviously, one series of N experiments (Bernoulli trials in our case) produces a set of results that is different from another series of N experiments.
Our goal is to evaluate probabilistic characteristic of a Bernoulli random variable ξ based on a series of experiments with it. But, since different series produces different results, we must consider results of our series of N experiment as a set of independent random variables, identically distributed as ξ, that we denote as ξ1, ξ2,...,ξN.

The only probabilistic characteristic of Bernoulli random variable ξ that we are interested in is the probability P of ξ to take a value of 1, since all other properties (like probability of taking a value of 0 or variance) follow from it.
Since E(ξ)=P, we will attempt to approximate it with a random variable
η = (ξ1+ξ2+...+ξN) / N

Let's define a concept of a good approximation of a constant P with a value of a random variable η. Since η is a random variable, it can take many values with different probabilities. It makes sense to say that η is a good approximation of P if values η can take are relatively close to P and those values that are closer are more probable, while those that are farther from P are less probable.

A good measure of this quality of approximation η≅P might be mathematical expectation E(η) (that we hope should be very close or even equal to P) and variance E{[(η−E(η)]²} - measure of how close values of η lie around its mathematical expectation.

E(ξ) = 1·P + 0·(1−P) = P
Var(ξ) = E[ξ − E(ξ)]² = P·(1−P)
Based on these, we could calculate the mathematical expectation and variance of η:
E(η) = N·E(ξ)/N = P
Var(η) = N·Var(ξ)/N² = Var(ξ)/N = P(1−P)/N

Although we don't know the probability P=Prob{ξ=1} and, consequently, we don't know Var(η), we can evaluate its range.
Since Var(ξ)=P(1−P), we can analyze the quadratic polynomial y=x(1−x) on a segment [0,1] and find its maximum.
The maximum value of y at x=1/2 is 1/4

Therefore,
Var(η) = Var(ξ)/N ≤ 1/(4N)

Let's understand what random variable η is and what is its distribution.
Obviously, this is a variable with Binomial distribution that can take values from 0 to 1:
0/N=0, 1/N, 2/N...N/N=1
(see the corresponding lecture in this course by following the links Probability - Binary Distributions - Binomial).
This is a rather complicated distribution, and, to find an interval Δ around its expectation that corresponds to a given level of certainty p is not easy, but we will make a shortcut that uses approximation to this distribution with Normal distribution along the bell curve.

As we know from the Central Limit Theorem (see Probability - Normal Distribution in this course), given certain rather liberal conditions, the average of a large number of random variables behaves approximately like a normal random variable with the same expectation and variance.
Assuming the number of experiments N is rather large to justify this approximation, let's use this property and approximate the distribution of our random variable η with a normal random variable that has mathematical expectation P and variance 1/(4N).
The variance of η is, actually, less or equal to 1/(4N), so our estimate for Δ might be a little wider, which works in our favor since the precision of our estimate of probability P will be slightly better than calculated based on properties of normal random variable.

Let's use "sigma limits" for a normal variable with expectation P and variance 1/(4N), from which standard deviation equals to σ=1/(2√N), to determine margin of error for three typical cases:
(a) for p≅0.6825 margin of error is not greater than σ=1/(2√N)
(b) for p≅0.9545 margin of error is not greater than 2σ=1/(√N)
(a) for p≅0.9973 margin of error is not greater than 3σ=3/(2√N)

Using the above calculations, for given number of experiments N and one of three typical certainty levels we found a margin of error.
It is easy to assume that certainty level and margin of error are given, in which case we can easily resolve the above equations for N - required minimum number of experiments to achieve required margin of error and certainty level.
Finally, we can determine a certainty level p for given number of experiments and margin of error.

As an example, with N=100 and p=0.9545 ("two sigma" case) the margin of error does not exceed 1/10. That means that the empirical frequency of ξ=1, obtained as a result of 100 experiment, deviates from the real probability Prob{ξ=1} by no more than 1/10 with a certainty level of 0.9545.
With 10,000 experiments our precision is greater, we could say that margin of error does not exceed 0.01 with the same certainty.

Wednesday, February 3, 2016

Unizor - Bernoulli Statistics - Task





Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on http://www.unizor.com

Bernoulli Statistics - Task

As we noted, the purpose of Mathematical Statistics is, using the past observations, to evaluate the probability of certain events to be able to predict their occurrence in the future.

Let's recall the Bernoulli trials and Bernoulli random variables. Coin tossing is an example of this. An experiment has only two outcomes and we associate with it a random variable that takes a value of 1 for one outcome and the values of 0 for another. Only one parameter defines the distribution of probabilities of this random variable - probability P of one of the outcomes. So, assume that our Bernoulli random variable ξ takes a value of 1 with probability P and 0 with probability 1−P.

Let's consider the simplest task of Mathematical Statistics.
Based on a series of Bernoulli trials (say, one coin tossed multiple times or multiple coins tossed simultaneously once), when we repeat the experiment with our random variable N times, independently of each other and under identical conditions to have the same distribution of probabilities, and knowing the results of these experiments - ξ1, ξ2 ...ξN, that is the values ξ took at any experiment, we need to determine the probability P that completely defines the distribution of probabilities of our random variable ξ.

In the Theory of Probabilities part of this course we rather intuitively introduced probability P of an event as a limit of frequency of an event's occurrence in independent experiments under identical conditions as the number of experiments increases to infinity. We did not go deeper into a problem of existing of this limit.
In our case of Bernoulli random variables, the sum of values of individual experiments (that is, ξi=1 if event happened and ξi=0 otherwise) divided by the number of these experiments is exactly that frequency. So, our very intelligent guess is that this average value, calculated on results of experiments, would be a good approximation of probability P:
η = (ξ1+ξ2+...+ξN)/N ≅ P
We also hope that, as the number of experiments N increases, the evaluation of probability P would be better and better.

Basically, we have formulated a task - evaluation of probability P, we also consider the solution to this task - using the arithmetic average of the results of experiments
η = (ξ1+ξ2+...+ξN)/N

All we have to do now is to determine if this solution is indeed giving us some evaluation of probability P and how good this evaluation really is.

Let's think about our task.
We would like to approximate an unknown constant P (that is, Prob{ξ=1} - probability of our Bernoulli random variable ξ to take a value of 1) with a single value of a random variable
η = (ξ1+ξ2+...+ξN)/N
(the average of results of a series of N individual experiments with random variable ξ: ξ1, ξ2 ...ξN, that can be considered as one combined experiment).
Another such combined experiment will produce a different value of η. That's why we consider η to be a random variable.

We have to think now about how close single values of η approximate constant P, whether this approximation depends on the number of individual experiments in a series, how to express this approximation quantitatively and what are the required parameters of our experimentation to achieve the needed precision of approximation.
These are the subjects of the next lecture about solution of our task.