Wednesday, October 22, 2014

Unizor - Probability - Bernoulli Distribution





Let's examine the properties of Bernoulli distribution of probabilities, a distribution of a random variable ξ, defined on a space of only two elementary events that we call SUCCESS (with a probability measure p) and FAILURE (with probability measure q=1−p), and taking, correspondingly, two values
ξ(SUCCESS) = 1 and
ξ(FAILURE) = 0.
So, we can say that our random variable ξ takes a value of 1 with probability p and a value of 0 with probability q=1−p:
P(ξ=1) = p and
P(ξ=0) = q = 1−p

Graphical representation of this distribution of probabilities is trivial. We build one rectangle with a segment [0,1] as a base and the height of p that represents the measure of probability of our random variable ξ to take its first value of 1 and another rectangle with a segment [1,2] as a base and the height of q that represents the measure of probability of ξ to take its second value of 0.
The total area of these two rectangles is, obviously, equal to 1, as it should be, since the sum of probabilities of ξ to take all possible values must be equal to 1.

The next task is a calculation of the expected value or mean of our random variable ξ.
Recall that, if a random variable takes certain discrete values with known probabilities, its expectation is a weighted average of its values with probabilities as weights.
In our case there are only two values, 1 and 0 that a random variable ξ takes with probabilities, correspondingly, p and q=1−p. So, the expectation (or mean) of such a random variable equals to
E(ξ) = 1·p + 0·q = p
This value of expectation is measured in the same units as the values of our random variable. For example, if two values, 1 and 0, are dollars, the expectation is p dollars. In this case it's just a coincidence that both the expectation and probability of having a value of 1 are the same and equal to p. The probability is a measure of frequency and, as such, has no units of measurement, while the expectation has the measurement of the random variable itself.

This result was easily predictable based on the statistical meaning of probabilities. If we repeat our Bernoulli trial N times, the number of SUCCESSes, where ξ equals to 1, will be approximately p·N and the remaining q·N results, where ξ equals to 0, will be FAILUREs. The precision of this approximation is increasing with the number of trials increasing to infinity.
Therefore, the average value of our random variable in N experiments will be approximately equal to
Ave(ξ)=(1·p·N + 0·q·N)/N=p
As the number of experiments grows, the statistical average will more and more precisely equal to p and, in a limit case, the equality will be absolutely precise. So, the expectation of a random variable is its statistical average value as the number of experiments grows to infinity.

The expectation or mean value of a random variable is, arguably, the most important its characteristic. The next in importance comes its standard deviation, which is a square root of its variance, which, in turn, is a weighted average of squares of deviations of the values of a random variable from its expectation.

Let's calculate these important characteristics.
There are two values our random variable ξ takes, 1 with the probability p and 0 with the probability q. Its mean value is equal to p. Therefore, weighted average of squares of deviations of its values from its expectation is equal to
Var(ξ) = (1−p)^2·p+(0−p)^2·q =
= (1−2·p+p^2)·p+p^2·q =
= p−2·p^2+p^3+p^2·(1−p) =
= p−p^2 = p·(1−p) =
= p·q since q=1−p.

From Var(ξ) = p·q we derive the standard deviation of our random variable
σ(ξ) = √(p·q) = √[p·(1−p)]
The standard deviation is a good measure of how wide the values of a random variable are spread around its mean measured in the same units as the random variable itself and its expectation (mean).

Example

Consider a lottery where you try to guess up to 6 numbers from 49. Assume, for simplicity, that you win (it is a SUCCESS) if you guessed at least 4 numbers and you don't win (a FAILURE) if you guessed 3 numbers or less. Using the results of calculation made in one of the previous lectures, the probability of SUCCESS is about 0.001.
Assume further that in case of SUCCESS you win a prize of $1000, which we assume to be equal to 1 measured in thousands of dollars and get nothing in case of FAILURE.
What are the expectation and standard deviation of your winning?

Since probability of SUCCESS equals to 0.001, the mean of the winning equals to 0.001 (measured in thousands of dollars), that is $1.
The standard deviation equals to
√0.001·0.999 ≅ 0.0316
(also in thousands of dollars) which is about $32.
So, your average win will be $1, but deviations from it are quite substantial, averaging around $32, but, obviously, might be much greater than that.

No comments: