Wednesday, October 29, 2014

Unizor - Probability - Geometric Distribution - Definition





The Geometric distribution of probabilities is related to Bernoulli trials. It's definition is based on a concept of the first SUCCESS - the number of experiments needed to reach the first SUCCESSful result of Bernoulli trials. In theory, if FAILURE occurs time after time (which is not impossible if the probability of FAILURE is not zero), the number of experiments needed to reach the first SUCCESS can be unlimitedly large

We will analyze the behavior of this number, more precisely, we will analyze the behavior of a random variable equal to the number of experiments needed to reach the first SUCCESS in a series of independent Bernoulli trials with a probability of SUCCESS p and, correspondingly, a probability of FAILURE q=1−p.

Example 1
Suppose you want to win a lottery, but your inner voice tells you that lottery is a form of gambling and, on average, you would lose money. Still, you want to try it and you make an agreement with your inner voice to play as many times as needed until the first winning. After that - no lottery gambling.
This is a typical example of the Geometric distribution. The number of tickets you have to buy to reach the first winning is exactly that random variable we talked about in the above definition.

Example 2
Assume a couple wants to have a daughter. If a son is born, they try again and again until a daughter is born. If the probability of a giving birth of a daughter is not zero, this process will end up after some number of trials. The number of children in this family born until they have a daughter is a random variable with Geometric distribution of probabilities.

Example 3
A student wants to pass a test that consists of certain number of questions. He knows answers to some of them, but not all. If he gets a question he knows, he will pass, otherwise he has to come again for a test.
Let's assume that a student does not study in between the tests, so the probability of getting a familiar question is the same. Then the number of attempts he makes to pass the test is a random variable with Geometric distribution of probabilities.

Now we are ready to precisely define the Geometric distribution.
Our sample space of elementary events can be represented as a set of strings of letters S (for a Bernoulli trial that results in SUCCESS) and F (for FAILUREs). Each string has certain number of letters F in the beginning and a single letter S at the end. This models a series of independent Bernoulli trials that last until the result is SUCCESS.

The random variable with Geometric distribution in this model, that is a numeric function defined for each elementary event, is the length of a string that represents this elementary event. Our task is to determine the probability of this length to be equal to some non-negative integer value.

The probability of our random variable to be equal to some number K is the probability to have K−1 FAILUREs in a series of independent Bernoulli trials followed by a SUCCESS.
If the result of the ith Bernoulli trial is a random variable β[i] with values S (SUCCESS) or F (FAILURE), we have to determine the following probability:
P(β[1]=F AND β[2]=F AND...AND β[K−1]=F AND β[K]=S)
As we know, the probability of a combination of independent events equals to a product of their individual probabilities. Therefore, our probability equals to
P(β[1]=F)·P(β[2]=F)·...·P(β[K−1]=F)·P(β[K]=S)
which, in turn, equals to
(1−p)^(K−1) · p

Let's denote our random variable with Geometric distribution γ[p] (index p signifies the probability of SUCCESS in each Bernoulli trial that participates in the definition of the Geometric distribution). Then we can describe the distribution of probabilities for this random variable as
P(γ[p]=K) = (1−p)^(K−1) · p

The above formula completely defines the random variable with Geometric distribution and will be used in further analysis of its properties and characteristics.

For example, let's calculate a probability of winning a lottery on the first trial (K=1) if the probability of winning for a single lottery ticket is p=0.4:
P(γ[0.4]=1)=(1−0.4)^(1−1) · 0.4=0.4
How about winning on the 3th attempt (K=3)?
P(γ[0.4]=3)=(1−0.4)^(3−1) · 0.4=0.144

In conclusion, let's check that a sum of probabilities of our random variable γp to take all possible values equals to 1.
All we have to do is to summarize an expression (1−p)^(K−1) · p for all K from 1 to infinity. Obviously, this is an infinite geometric series. It's sum depends only on the first member a (which is equal to p) and a multiplier (denominator) r (which is equal to 1−p) and is expressed as
S = a/(1−r)
In our case
S = p/[1-(1-p)] = p/p = 1
So, our definition of Geometric distribution of probabilities satisfies a necessary condition to sum up to 1.

No comments: