Monday, October 27, 2014

Unizor - Probability - Binomial Distribution

Let's recall the definition of a binomial distribution with two parameters: the number of simultaneous independent Bernoulli trials N and the probability of SUCCESS in each of them p.
The Binomial distribution of probabilities is a distribution of a random variable ξ[N,p] that is equal to a number of SUCCESSful Bernoulli trials. This random variable is, obviously, takes values from 0 to N with different probabilities and the Binomial distribution of probabilities is a distribution of probabilities among these N+1 values.

Consider a case with N=3, that is we make three independent Bernoulli trials and count the number of SUCCESSes ξ[3,p]. The number of SUCCESSes in this case is either 0 or 1, or 2, or 3. There are eight different outcomes of these three Bernoulli trials:
(F,F,F) and ξ[3,p](F,F,F)=0
(F,F,S) and ξ[3,p](F,F,S)=1
(F,S,F) and ξ[3,p](F,S,F)=1
(F,S,S) and ξ[3,p](F,S,S)=2
(S,F,F) and ξ[3,p](S,F,F)=1
(S,F,S) and ξ[3,p](S,F,S)=2
(S,S,F) and ξ[3,p](S,S,F)=2
(S,S,S) and ξ[3,p](S,S,S)=3
Since the probability of SUCCESS in any single independent Bernoulli trial is p and the probability of FAILURE is q=1−p, we can conclude that
P(ξ[3,p]=0) = P(F,F,F) = q^3
P(ξ[3,p]=1) = P(F,F,S)+P(F,S,F)+P(S,F,F) = 3·p·q^2
P(ξ[3,p]=2) = P(S,S,F)+P(S,F,S)+P(F,S,S) = 3·p^2·q
P(ξ[3,p]=3) = P(S,S,S) = p^3
Just for checking, the sum of all the probabilities must be equal to 1. Indeed,
If p=1/2 and we want to represent this distribution of probabilities graphically, we can construct four rectangles on the coordinate plane, one with a base [0,1] and height p^3=1/8, another - with a base [1,2] and the height 3·p^2·q=3/8, the third - with a base [2,3] and the height 3·p·q^2=3/8 and the fourth - with a base [3,4] and the height q^3=1/8.
Notice that the probabilities of ξ[3,p] taking different values are members of the expression (p+q)^3.

It's time to consider a general case.
Let ξ[N,p] to represent the number of SUCCESSes in N independent Bernoulli trials with a probability of SUCCESS in each of them equal to p.
We want to find the probability of this random variable to take a value K, where K can be any number from 0 (all FAILUREs) to N (all SUCCESSes).
In order to have exactly K SUCCESSes as a result of this experiment we have to have K Bernoulli trials to be SUCCESSful and N−K trial to be FAILUREs. Since we don't care which exactly trials succeed and which fail, to calculate our probability, we have to summarize the probabilities of all elementary events that contain K SUCCESSes and N−K FAILUREs.
The number of such elementary events equals to the number of combinations from N objects by K, that is C(N,K).
The probability of each elementary event that contains K SUCCESSes and N−K FAILUREs equal to p^K·q^(N−K).
P(ξ[N,p]=K) = C(N,K)·p^K·q^(N−K)

Incidentally, recall the Newton's binomial formula presented in this course in the chapter on mathematical induction:
(a+b)^n = Σi∈{0,n}[C(n,i)·a^(n-i)·b^i]
(summation by i from 0 to n).
As you see, the coefficients in the binomial formula are exactly the same as individual probabilities of the random variable that has binomial distribution of probabilities. That's why our random variable's distribution of probabilities is called Binomial.

Mean (Expectation) and Variance

In the lecture dedicated to a definition of the distribution of probabilities we defined Bernoulli an Binomial distributions and mentioned that a Binomial random variable with parameters N and p can be considered as a sum of N independent Bernoulli random variables with a probability of SUCCESS p.

Therefore, using the additive properties of the expected value (mean) of a sum of random variables, we can derive the expected value of the Binomial random variable ξN,p. It is just a sum of expected values of N Bernoulli random variables with a probability of SUCCESS p. Since the expected value (mean) of such a Bernoulli random variable equals to p, the expected value of our Binomial random variable is
E(ξ[N,p]) = N·p

Similarly, we know that a variance of a sum of independent random variables equals to a sum of their variances.

As mentioned many times above, a Binomial random variable ξN,p is not just a sum of N Bernoulli random variables with a probability of SUCCESS p, but a sum of N independent Bernoulli random variables.
The variance of each such Bernoulli random variance, as we know, is p·q, where q=1−p.
Therefore, the variance of our Binomial random variable is
Var(ξ[N,p]) = N·p·q

From this we can derive the standard deviation of Binomial random variable ξ[N,p]:
σ(ξ[N,p]) = √(N·p·q)

No comments: