## Tuesday, February 16, 2016

### Unizor - Bernoulli Statistics - New Solution

Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on http://www.unizor.com

Bernoulli Statistics -
New Solutions to Old Problems

In this lecture we will apply the sample variance instead of the upper bound of the real variance to obtain more precise, albeit slightly less certain, evaluation of statistical parameters.

Problem 1

A quality control at some parts manufacturer has determined that out of 10,000 sampled parts made by this manufacturer 300 were defective.
Determine the probability of manufacturing the defective part and a margin of error with the level of certainty equal to 0.9545.

Solution

Consider Bernoulli random variable ξ that takes the value 1 with an unknown probability P if a part is defective and takes the value 0 otherwise.
As we know, mathematical expectation and variance of this random variable are:
E(ξ) = P
Var(ξ) = P·(1−P)

Then the random variable representing a frequency of defective parts is expressed as
η = (ξ1+ξ2+...+ξN) / N
Here N=10,000, all ξi are independent random variables identically distributed as ξ and a single value of random variable η is 300/10000=0.03.

We assume that the distribution of η is close to Normal with mathematical expectation and variance, expressed in terms of unknown probability P as
E(η) = N·E(ξ) / N = P
Var(η) = σ² = N·Var(ξ) / N² = P(1−P) / N

Since the unknown probability P equals to mathematical expectation of η and a single value of η is an unbiased approximation to this expectation, we can say that the probability P is, approximately, equal to 0.03.

To determine the margin of error, recall that for a Normal random variable with certainty level of 0.9545 its values are within an interval of 2σ from its mathematical expectation, where σ is a standard deviation.

Instead of using the upper bound of the standard deviation, as was suggested in the first lecture that presented this problem, that is σ is not greater than 1/(2√N) = 0.005,
we will use a sample variance calculated based on obtained results from our 10,000 experiments that produced 300 defective details.
The sample mean is
m = 300/10000 = 0.03
The sample variance is
s² = [300·(1−0.03)² + 9700·(0−0.03)²] / 9999 ≅ 0.0291
The sample standard deviation (square root of s²) is, therefore, 0.17. This is a better (smaller) evaluation of the standard deviation of our random variable ξ than its upper bound 1/4.

Based on this more precise evaluation, standard deviation of η is 0.17/100 = 0.0017.
The 2σ rule, therefore, says that with certainty level 0.9545 the real probability of manufacturing a defective part is from 0.03−0.0034=0.0266 to 0.03+0.0034=0.0334.
Symbolically, it looks like this:
Prob{0.0266 ≤ P ≤ 0.0334} ≅ 0.9545

This is a narrower interval than [0.02;0.04] that we have obtained using the upper bound evaluation of standard deviation - a simpler but rather crude method.

So, our evaluation of probability P is more precise when using sample variance. However, certain element of uncertainty (relatively small for large samples) was introduced when we approximated real variance with its sample-based value.

Problem 2

A quality control at some parts manufacturer has determined that out of 10,000 sampled parts made by this manufacturer 300 were defective.
What certainty level can we attribute to the following (narrower than in a previous problem) evaluation of probability P of manufacturing a defective part:
P∈[0.0283;0.0317]

Solution

Notice that in this case we are talking about a margin of error 0.0017 around the empirical sample average 0.03, that we can use as an unbiased evaluation of probability P. This margin of error equals to a sample variance - a relatively good approximation of a standard deviation of our random variable η. Therefore, as we know, the probability of a normal random variable to be in the vicinity of σ from its mathematical expectation equals to 0.6825.
Therefore, the level of certainty for this evaluation is:
Prob{P∈[0.0283;0.0317]} ≅ 0.6825
As you see, more precise evaluation can be made with less certainty.

Problem 3

Now our purpose is to determine the volume N of the sample set of parts required to evaluate the probability of manufacturing a defective part within a margin of error Δ=0.001 with certainty level p=0.9545.

Crude evaluation of this standard deviation σ based on its upper bound 1/(2√N) gives us the value of N as a solution to an equation
1/(2√N) = 0.0005
which is N=1000²=1,000,000.

We would like to use a more precise sample variance evaluation instead of a crude upper bound of it to reduce the required number of experiments, but the problem is - we don't have a sample yet, we want to evaluate its volume before we do real experiments.
Watch the video for a suggested solution .