Unizor - Creative Mind through Art of Mathematics: October 2014

Wednesday, October 29, 2014

Unizor - Probability - Geometric Distribution - Properties

Definition

Recall the definition of the Geometric distribution of probabilities.
Assume that we conduct a sequence of independent random experiments - Bernoulli trials with the probability of SUCCESS p - with the goal to reach the first SUCCESS. The number of trials to achieve this goal is, obviously, a random variable. The distribution of probabilities of this random variable is called Geometric.

Formula for
Distribution of Probabilities

Recall from a previous lecture the formula for the probability of a random variable distributed Geometrically to take a value of K:
P(γ[p]=K) = (1−p)^(K−1)·p

Graphical Representation

Our random variable can take any integer value from 1 to infinity with the probability expressed by the above formula.
The graphical representation of this distribution of probabilities consists of a sequence of rectangles with the bases [0.1], [1,2], etc. and the height of Kth rectangle equal to (1−p)^(K−1)·p.
This resembles a staircase with gradually decreasing height of the steps from p for the first down to 0 as we move farther and farther from the beginning.

Expectation (Mean)

The expectation of a random variable that takes values x1, x2, etc. with probabilities p1, p2, etc. is a weighted average of its values with probabilities as weights:
E = x1·p1+x2·p2+...

In our case, considering the random variable can take any integer value from 1 to infinity with the probability described by the formula above, its expectation equals to
E(γ[p]) =
1·(1−p)^0·p +
2·(1−p)^1·p +
3·(1−p)^2·p +
4·(1−p)^3·p...

To calculate the value of this expression, multiply its both sides by a factor (1−p):
(1−p)·E(γ[p]) =
1·(1−p)^1·p +
2·(1−p)^2·p +
3·(1−p)^3·p+...
and subtract the result from the original sum
E(γ[p]) − (1−p)·E(γ[p]) =
(1−p)^0·p +
(1−p)^1·p +
(1−p)^2·p +...

On the left of this equation we have p·E(γ[p]).
On the right we have a geometric series that converges to p/[1−(1−p)]=1.

Therefore, p·E(γ[p]) = 1
And the mean value (expectation) of γp is
E(γ[p]) = 1/p

This value of the expectation is intuitively correct since, when the probability of SUCCESS is greater, we expect to get it sooner, in the smaller number of trials on average. Also, if the probability of SUCCESS equals to 1, which means that we cannot get FAILURE at all, we expect to get SUCCESS on the first trial. Finally, if the probability of SUCCESS is diminishing, we expect to get it on average later, with more and more trials.

Variance, Standard Deviation

Variance of a random variable that takes values x1, x2, etc. with probabilities p1, p2, etc. is a weighted average of squares of deviations of the values of our random variable from its expected value E:
Var = (x1−E)^2·p1+(x2−E)^2·p2+...

In our case, considering the random variable can take any integer value from 1 to infinity with the probability described by the formula above and its expectation equals to 1/p, the variance equals to
Var(γ[p]) =
(1−1/p)^2·(1−p)^0·p +
(2−1/p)^2·(1−p)^1·p +
(3−1/p)^2·(1−p)^2·p +
(4−1/p)^2·(1−p)^3·p...

Reducing this complex expression to a short form involves a lot of calculations. Thankfully, it was done by mathematicians and documented numerous times. The idea is similar to what we used for calculating the mean value - multiplying the sum by a denominator (1−p) of a geometric progression and subtracting the resulting sum from the original. The final formula is:
Var(γ[p]) = (1−p)/(p^2)

For probability values from 0 to 1 this expression is always positive, equals to 0 for the probability of SUCCESS equaled to 1 (as it should, since we always have the SUCCESS on the first trial with no deviation from this). As the probability of SUCCESS diminishes to 0, the variance increases to infinity (as it should, since on average it would take more and more trials to reach the SUCCESS).

As for the standard deviation, it is equal to a square root of the variance:
σ(γ[p]) = [√(1−p)]/p

Unizor - Probability - Geometric Distribution - Definition

The Geometric distribution of probabilities is related to Bernoulli trials. It's definition is based on a concept of the first SUCCESS - the number of experiments needed to reach the first SUCCESSful result of Bernoulli trials. In theory, if FAILURE occurs time after time (which is not impossible if the probability of FAILURE is not zero), the number of experiments needed to reach the first SUCCESS can be unlimitedly large

We will analyze the behavior of this number, more precisely, we will analyze the behavior of a random variable equal to the number of experiments needed to reach the first SUCCESS in a series of independent Bernoulli trials with a probability of SUCCESS p and, correspondingly, a probability of FAILURE q=1−p.

Example 1
Suppose you want to win a lottery, but your inner voice tells you that lottery is a form of gambling and, on average, you would lose money. Still, you want to try it and you make an agreement with your inner voice to play as many times as needed until the first winning. After that - no lottery gambling.
This is a typical example of the Geometric distribution. The number of tickets you have to buy to reach the first winning is exactly that random variable we talked about in the above definition.

Example 2
Assume a couple wants to have a daughter. If a son is born, they try again and again until a daughter is born. If the probability of a giving birth of a daughter is not zero, this process will end up after some number of trials. The number of children in this family born until they have a daughter is a random variable with Geometric distribution of probabilities.

Example 3
A student wants to pass a test that consists of certain number of questions. He knows answers to some of them, but not all. If he gets a question he knows, he will pass, otherwise he has to come again for a test.
Let's assume that a student does not study in between the tests, so the probability of getting a familiar question is the same. Then the number of attempts he makes to pass the test is a random variable with Geometric distribution of probabilities.

Now we are ready to precisely define the Geometric distribution.
Our sample space of elementary events can be represented as a set of strings of letters S (for a Bernoulli trial that results in SUCCESS) and F (for FAILUREs). Each string has certain number of letters F in the beginning and a single letter S at the end. This models a series of independent Bernoulli trials that last until the result is SUCCESS.

The random variable with Geometric distribution in this model, that is a numeric function defined for each elementary event, is the length of a string that represents this elementary event. Our task is to determine the probability of this length to be equal to some non-negative integer value.

The probability of our random variable to be equal to some number K is the probability to have K−1 FAILUREs in a series of independent Bernoulli trials followed by a SUCCESS.
If the result of the ith Bernoulli trial is a random variable β[i] with values S (SUCCESS) or F (FAILURE), we have to determine the following probability:
P(β[1]=F AND β[2]=F AND...AND β[K−1]=F AND β[K]=S)
As we know, the probability of a combination of independent events equals to a product of their individual probabilities. Therefore, our probability equals to
P(β[1]=F)·P(β[2]=F)·...·P(β[K−1]=F)·P(β[K]=S)
which, in turn, equals to
(1−p)^(K−1) · p

Let's denote our random variable with Geometric distribution γ[p] (index p signifies the probability of SUCCESS in each Bernoulli trial that participates in the definition of the Geometric distribution). Then we can describe the distribution of probabilities for this random variable as
P(γ[p]=K) = (1−p)^(K−1) · p

The above formula completely defines the random variable with Geometric distribution and will be used in further analysis of its properties and characteristics.

For example, let's calculate a probability of winning a lottery on the first trial (K=1) if the probability of winning for a single lottery ticket is p=0.4:
P(γ[0.4]=1)=(1−0.4)^(1−1) · 0.4=0.4
How about winning on the 3th attempt (K=3)?
P(γ[0.4]=3)=(1−0.4)^(3−1) · 0.4=0.144

In conclusion, let's check that a sum of probabilities of our random variable γp to take all possible values equals to 1.
All we have to do is to summarize an expression (1−p)^(K−1) · p for all K from 1 to infinity. Obviously, this is an infinite geometric series. It's sum depends only on the first member a (which is equal to p) and a multiplier (denominator) r (which is equal to 1−p) and is expressed as
S = a/(1−r)
In our case
S = p/[1-(1-p)] = p/p = 1
So, our definition of Geometric distribution of probabilities satisfies a necessary condition to sum up to 1.

Monday, October 27, 2014

Unizor - Probability - Binomial Distribution

Let's recall the definition of a binomial distribution with two parameters: the number of simultaneous independent Bernoulli trials N and the probability of SUCCESS in each of them p.
The Binomial distribution of probabilities is a distribution of a random variable ξ[N,p] that is equal to a number of SUCCESSful Bernoulli trials. This random variable is, obviously, takes values from 0 to N with different probabilities and the Binomial distribution of probabilities is a distribution of probabilities among these N+1 values.

Example
Consider a case with N=3, that is we make three independent Bernoulli trials and count the number of SUCCESSes ξ[3,p]. The number of SUCCESSes in this case is either 0 or 1, or 2, or 3. There are eight different outcomes of these three Bernoulli trials:
(F,F,F) and ξ[3,p](F,F,F)=0
(F,F,S) and ξ[3,p](F,F,S)=1
(F,S,F) and ξ[3,p](F,S,F)=1
(F,S,S) and ξ[3,p](F,S,S)=2
(S,F,F) and ξ[3,p](S,F,F)=1
(S,F,S) and ξ[3,p](S,F,S)=2
(S,S,F) and ξ[3,p](S,S,F)=2
(S,S,S) and ξ[3,p](S,S,S)=3
Since the probability of SUCCESS in any single independent Bernoulli trial is p and the probability of FAILURE is q=1−p, we can conclude that
P(ξ[3,p]=0) = P(F,F,F) = q^3
P(ξ[3,p]=1) = P(F,F,S)+P(F,S,F)+P(S,F,F) = 3·p·q^2
P(ξ[3,p]=2) = P(S,S,F)+P(S,F,S)+P(F,S,S) = 3·p^2·q
P(ξ[3,p]=3) = P(S,S,S) = p^3
Just for checking, the sum of all the probabilities must be equal to 1. Indeed,
p^3+3·p^2·q+3·p·q^2+q^3=(p+q)^3=1
If p=1/2 and we want to represent this distribution of probabilities graphically, we can construct four rectangles on the coordinate plane, one with a base [0,1] and height p^3=1/8, another - with a base [1,2] and the height 3·p^2·q=3/8, the third - with a base [2,3] and the height 3·p·q^2=3/8 and the fourth - with a base [3,4] and the height q^3=1/8.
Notice that the probabilities of ξ[3,p] taking different values are members of the expression (p+q)^3.

It's time to consider a general case.
Let ξ[N,p] to represent the number of SUCCESSes in N independent Bernoulli trials with a probability of SUCCESS in each of them equal to p.
We want to find the probability of this random variable to take a value K, where K can be any number from 0 (all FAILUREs) to N (all SUCCESSes).
In order to have exactly K SUCCESSes as a result of this experiment we have to have K Bernoulli trials to be SUCCESSful and N−K trial to be FAILUREs. Since we don't care which exactly trials succeed and which fail, to calculate our probability, we have to summarize the probabilities of all elementary events that contain K SUCCESSes and N−K FAILUREs.
The number of such elementary events equals to the number of combinations from N objects by K, that is C(N,K).
The probability of each elementary event that contains K SUCCESSes and N−K FAILUREs equal to p^K·q^(N−K).
Therefore,
P(ξ[N,p]=K) = C(N,K)·p^K·q^(N−K)

Incidentally, recall the Newton's binomial formula presented in this course in the chapter on mathematical induction:
(a+b)^n = Σi∈{0,n}[C(n,i)·a^(n-i)·b^i]
(summation by i from 0 to n).
As you see, the coefficients in the binomial formula are exactly the same as individual probabilities of the random variable that has binomial distribution of probabilities. That's why our random variable's distribution of probabilities is called Binomial.

Mean (Expectation) and Variance

In the lecture dedicated to a definition of the distribution of probabilities we defined Bernoulli an Binomial distributions and mentioned that a Binomial random variable with parameters N and p can be considered as a sum of N independent Bernoulli random variables with a probability of SUCCESS p.

Therefore, using the additive properties of the expected value (mean) of a sum of random variables, we can derive the expected value of the Binomial random variable ξN,p. It is just a sum of expected values of N Bernoulli random variables with a probability of SUCCESS p. Since the expected value (mean) of such a Bernoulli random variable equals to p, the expected value of our Binomial random variable is
E(ξ[N,p]) = N·p

Similarly, we know that a variance of a sum of independent random variables equals to a sum of their variances.

As mentioned many times above, a Binomial random variable ξN,p is not just a sum of N Bernoulli random variables with a probability of SUCCESS p, but a sum of N independent Bernoulli random variables.
The variance of each such Bernoulli random variance, as we know, is p·q, where q=1−p.
Therefore, the variance of our Binomial random variable is
Var(ξ[N,p]) = N·p·q

From this we can derive the standard deviation of Binomial random variable ξ[N,p]:
σ(ξ[N,p]) = √(N·p·q)

Wednesday, October 22, 2014

Unizor - Probability - Bernoulli Distribution

Let's examine the properties of Bernoulli distribution of probabilities, a distribution of a random variable ξ, defined on a space of only two elementary events that we call SUCCESS (with a probability measure p) and FAILURE (with probability measure q=1−p), and taking, correspondingly, two values
ξ(SUCCESS) = 1 and
ξ(FAILURE) = 0.
So, we can say that our random variable ξ takes a value of 1 with probability p and a value of 0 with probability q=1−p:
P(ξ=1) = p and
P(ξ=0) = q = 1−p

Graphical representation of this distribution of probabilities is trivial. We build one rectangle with a segment [0,1] as a base and the height of p that represents the measure of probability of our random variable ξ to take its first value of 1 and another rectangle with a segment [1,2] as a base and the height of q that represents the measure of probability of ξ to take its second value of 0.
The total area of these two rectangles is, obviously, equal to 1, as it should be, since the sum of probabilities of ξ to take all possible values must be equal to 1.

The next task is a calculation of the expected value or mean of our random variable ξ.
Recall that, if a random variable takes certain discrete values with known probabilities, its expectation is a weighted average of its values with probabilities as weights.
In our case there are only two values, 1 and 0 that a random variable ξ takes with probabilities, correspondingly, p and q=1−p. So, the expectation (or mean) of such a random variable equals to
E(ξ) = 1·p + 0·q = p
This value of expectation is measured in the same units as the values of our random variable. For example, if two values, 1 and 0, are dollars, the expectation is p dollars. In this case it's just a coincidence that both the expectation and probability of having a value of 1 are the same and equal to p. The probability is a measure of frequency and, as such, has no units of measurement, while the expectation has the measurement of the random variable itself.

This result was easily predictable based on the statistical meaning of probabilities. If we repeat our Bernoulli trial N times, the number of SUCCESSes, where ξ equals to 1, will be approximately p·N and the remaining q·N results, where ξ equals to 0, will be FAILUREs. The precision of this approximation is increasing with the number of trials increasing to infinity.
Therefore, the average value of our random variable in N experiments will be approximately equal to
Ave(ξ)=(1·p·N + 0·q·N)/N=p
As the number of experiments grows, the statistical average will more and more precisely equal to p and, in a limit case, the equality will be absolutely precise. So, the expectation of a random variable is its statistical average value as the number of experiments grows to infinity.

The expectation or mean value of a random variable is, arguably, the most important its characteristic. The next in importance comes its standard deviation, which is a square root of its variance, which, in turn, is a weighted average of squares of deviations of the values of a random variable from its expectation.

Let's calculate these important characteristics.
There are two values our random variable ξ takes, 1 with the probability p and 0 with the probability q. Its mean value is equal to p. Therefore, weighted average of squares of deviations of its values from its expectation is equal to
Var(ξ) = (1−p)^2·p+(0−p)^2·q =
= (1−2·p+p^2)·p+p^2·q =
= p−2·p^2+p^3+p^2·(1−p) =
= p−p^2 = p·(1−p) =
= p·q since q=1−p.

From Var(ξ) = p·q we derive the standard deviation of our random variable
σ(ξ) = √(p·q) = √[p·(1−p)]
The standard deviation is a good measure of how wide the values of a random variable are spread around its mean measured in the same units as the random variable itself and its expectation (mean).

Example

Consider a lottery where you try to guess up to 6 numbers from 49. Assume, for simplicity, that you win (it is a SUCCESS) if you guessed at least 4 numbers and you don't win (a FAILURE) if you guessed 3 numbers or less. Using the results of calculation made in one of the previous lectures, the probability of SUCCESS is about 0.001.
Assume further that in case of SUCCESS you win a prize of $1000, which we assume to be equal to 1 measured in thousands of dollars and get nothing in case of FAILURE.
What are the expectation and standard deviation of your winning?

Since probability of SUCCESS equals to 0.001, the mean of the winning equals to 0.001 (measured in thousands of dollars), that is $1.
The standard deviation equals to
√0.001·0.999 ≅ 0.0316
(also in thousands of dollars) which is about $32.
So, your average win will be $1, but deviations from it are quite substantial, averaging around $32, but, obviously, might be much greater than that.

Tuesday, October 21, 2014

Unizor - Probability - Binary Distributions

Binary probability distribution is a distribution related to random experiments with just two outcomes.

In this lecture we will consider two binary distributions:
Bernoulli distribution and Binomial distribution.

Bernoulli Distribution

This is a distribution within a sample space that contains only two elementary events called SUCCESS and FAILURE. Then the measure of probability p (0≤p≤1) is assigned to one of them and the measure of probability q=1−p is assigned to another.

Usually, we will not deal with this sample space or its elementary events, but, instead, assume that there is a random variable ξ, defined as a numeric function on this sample space, that takes the value of 1 on one elementary event - ξ(SUCCESS)=1 - and the value of 0 on another - ξ(FAILURE)=0, with probabilities, correspondingly, p and q=1−p.
Symbolically,
P(ξ=1) = p
P(ξ=0) = q = 1−p

We can describe this differently, using the random variable ξ we defined above that takes two values 1 and 0, correspondingly on SUCCESS and FAILURE. Assume we repeat our experiment with two outcomes again and again, and the result of the Jth experiment is Ej. Then ξ(Ej)=1 if Ej=SUCCESS and ξ(Ej)=0 if Ej=FAILURE.
Then, if we conduct N experiments, the sum of all ξ(Ej), where J runs from 1 to N, symbolically expressed as Σ{J∈[1,N]} ξ(Ej), is the number of times our experiment ended in SUCCESS.
Therefore, the ratio of the number of SUCCESS outcomes to a total number of experiments equals to
[Σξ(Ej)] / N
Since the limit of this ratio, as the number of experiments increases to infinity, is the definition of the measure of probability of the outcome SUCCESS, we can write the following scary looking equality that symbolically states what we talked about when defining the Bernoulli distribution:
lim(N→∞){[Σξ(Ej)] / N} = p

A possible interpretation of the above equality that involves the limits might be that with large number of experiments N the number of SUCCESS outcomes is approximately equal to p·N.

A simple example of a Bernoulli distribution is a coin tossing. With an ideal coin the heads and tails have equal chances to come up, therefore their probabilities are 1/2 each:
P(HEADS) = P(TAILS) = 1/2
If we associate a random variable ξ with this random experiment and set ξ(HEADS)=1 and ξ(TAILS)=0, we obtain a classic example of a Bernoulli random variable ξ:
P(ξ=1) = p = 1/2 and
P(ξ=0) = q = 1−p = 1/2

Binomial Distribution

Consider N independent Bernoulli random experiments with results SUCCESS or FAILURE and the same probability of SUCCESS in each one. The number of SUCCESSes among the results of this combined experiment is a random variable. It can take values from 0 to N with different probabilities. The distribution of this random variable is called Binomial.

Obviously, we are interested in quantitative characteristics of this distribution, more precisely, we would like to calculate the probability of having exactly K SUCCESSes out of N independent Bernoulli experiments with the probability of SUCCESS equal to p in each one, where K can be any number from 0 to N.

Using the language of Bernoulli random variables, our task can be formulated differently.
Let ξi be a Bernoulli random variable that describes the i-th Bernoulli experiment, that is it is equal to 1 with a probability p and equals to 0 with a probability q=1−p. Then the sum of N such random variables is exactly the number of SUCCESSes in N Bernoulli experiments we are talking about.
So, the random variable
η = Σ ξi ,
where all ξi are independent Bernoulli random variables,
has Binomial distribution.

Let's now calculate the probabilities of our Binomial random variable η to have different values, that is let's determine the quantitative characteristic of this distribution.

We are interested in determining the value of P(η=K) for all K from 0 to N.

For a sum η of N independent Bernoulli variables ξi to be equal to K, exactly K out of N of these Bernoulli variables must be equal to 1 and the other N−K variables must be equal to 0.
From combinatorics theory we know that we can choose K elements out N in
C(N,K) = (N!)/[(K!)·(N−K)!]
ways.
Once chosen, these K random variables must be equal to 1 with a probability p^K and the other N−K variable must be equal to 0 with a probability q^(N−K).
That determines the probability of a Binomial random variable to have a value of :
P(η=K) = C(N,K) · p^K · q^(N−K)
The only two parameters of this distribution are the number of Bernoulli random variables N participating in the Binomial distribution (which is the number of Bernoulli random experiments results of which we follow) and the probability of SUCCESS p for each such Bernoulli experiment.
The probability q is not a new parameter since q=1−p and the formula above can be written as
P(η=K) = C(N,K) · p^K · (1−p)^(N−K)

Wednesday, October 8, 2014

Unizor - Probability - Continuous Distribution

Let's consider a different random experiment that is modeled by an infinite and uncountable set of elementary events and, correspondingly, uncountable number of values of a random variable defined on them.

For example, we measure weights of tennis balls manufactured by a particular plant. In theory (and we are talking about mathematical model, not the real world), this weight can be any real non-negative number within certain limits, like from 50 to 60 gram. Let's assume that we can measure the weight absolutely precisely. Repeating the experiment again and again and counting the times this weight exactly equals to, say, 55 gram and taking a ratio of the number of times when the weight is 55 gram to the total number of experiments, we will get closer and closer to zero. And so will be with any other specific weight.

So, any particular value of our random variable has a probability of zero, but the number of these values is a number of all real numbers from 50 to 60 - an uncountable infinity. This presents a mathematical challenge in operating with particular values of this random variable.

To overcome this challenge, instead of considering individual values of our random variable, we should consider intervals.
In our example of the weight of a tennis ball, we can talk about a probability of this weight to be in the interval from, say, 54 to 56 gram. This probability will be greater than zero. The wider our interval - the larger the probability. At the extreme, for an interval from 50 to 60 gram, the probability will be equal to 1 because all tennis balls are manufactured with the weight in this interval.

The probability of our random variable of having any particular exact value equals to zero, but it has a non-zero probability of having a value within some interval, different probabilities for different intervals. Such random variables are called continuous.
Then for such random variable ξ we can say that the probability of ξ to take a value in the interval [a,b] equals to p (which depends on a and b). Usually, all possible values of a continuous random variable constitute a finite or infinite continuous interval. The probability of this random variable to take a value within this interval equals to 1 and the probability of it to take a value in some narrower interval is less than 1.

Example - Sharp Shooting Competition

Sharpshooters are shooting a target, and the random variable we are interested in is the distance of a point where a bullet hits the target from the target's center.
For a particular sharpshooter, assuming his skill level is constant, he does not get tired and does not miss for more than 0.5 meters, the continuous distribution of this random variable is defined on the range of values from 0 (when he hit exactly at the center of a target) to a maximum deviation of his bullet from the center that we assumed to be 0.5 meters.

For any exact value of our random variable, say, 0.2768 meters, the probability of having this value is 0. It is obvious if we recall that the probability is a limit of the ratio of the number of occurrence of a particular event to a total number of experiments. As a sharpshooter fires shots to infinity, the ratio of the number of shots on a distance of exactly 0.2768 meters from a center, as well as on any other exact distance, tends to zero.

The continuous distribution of probabilities can (also only approximately) be represented graphically similarly to the way we presented the discrete distributions. First of all, we break an entire range of values of our random variable (the distance from a center of a target) into 5 smaller intervals of 0.1 meters wide and mark these points on the X-axis: 0, 0.1, 0.2, 0.3, 0.4 and 0.5. On each interval between these values we construct a rectangle of a height equal to the corresponding probability of our random variable to fall within this interval. So, for a random variable ξ (results of the first shooter) the rectangles might be:
base [0,0.1] - height 0.6
base [0.1,0.2] - height 0.2
base [0.2,0.3] - height 0.1
base [0.3,0.4] - height 0.07
base [0.4,0.5] - height 0.03

Obviously, the more intervals we use to break the entire range of values of a random variable into smaller intervals - the more precisely we can characterize the continuous distribution.

Tuesday, October 7, 2014

Unizor - Probability - Discrete Distribution

In the previous lectures we introduced a concept of a random experiment and, as its mathematical model, we described sample space Ω (a set) that contained finite number N of elementary events e1, e2,...,eN (elements of this set) modeling the results (outcomes) of our random experiment.

With each such elementary event eK (where K∈[1,N]) we associated certain real number pK, its probability
0 ≤ pK ≤ 1 (K∈[1,N])
ΣpK = 1 where K∈[1,N]

This probability has all the characteristics of a measure (like weight, length or area) - its non-negative and additive - with the only restriction that the sum of all probability measures of all elementary events totals to 1 since it reflects the ratio of occurrence of any outcome to a total number of experiments.

The set of probabilities p1, p2,...,pN is called a probability distribution of our random experiment with finite number of outcomes.

If there is a random variable ξ defined on the set of elementary events that takes the value xK for an event eK (K∈[1,N]), then these same probabilities define the probability distribution of the random variable ξ:
P(ξ=x1) = p1
P(ξ=x2) = p2
...
P(ξ=xN) = pN

As you see, in this description we address random experiments with finite number of outcomes. The probability distributions associated with these random experiments, their elementary events and random variables defined on these events are called discreet since the different values of probabilities as well as the different values of random variables defined on the elementary events are separated from each other. They can be represented as individual points on a numeric line.

In a more complicated case there might be infinite but countable number of elementary events and values of a random variable defined on them.
For example, an experiment might be to choose any natural number N (this is the elementary event and, at the same time, the value of a random variable ξ defined on it) and assign a value of 1/(2^N) to a probability associated with this elementary event:
P(ξ=N)=1/(2^N).
There are infinite but countable number of different elementary events, all probabilities are in the range from 0 to 1 and their sum (which is a sum of an infinite geometric progression 1/2 + 1/4 + 1/8 + ...) equals to 1, as can be easily shown.
The distribution of probabilities in this and analogous cases is also considered discrete since there is always a non-zero distance between different measures of probabilities and, if a random variable is defined on these elementary events, the values of such random variable will also be discrete, that is separated from each other.

Incidentally, for the example above would be interesting to calculate the expected value of the random variable ξ, that is to find a sum
Σ[K/(2^K)] where K∈[1,∞).
It's a good problem on geometric progression and we recommend to try to solve it by yourselves. The answer should be 2, by the way, but you need a little trick to come up with it.

It's very useful to represent the distribution of probabilities in a graphic form using a concept of an area as a substitute for a probability measure. On an X-axis in this representation we will use the points 1, 2, 3,... as the corresponding representation of elementary events e1, e2, e3,... This can be used for both finite and infinite countable number of elementary events.
Next on each segment from K−1 to K we build a rectangle of the height equal to the probability of the elementary event eK. The resulting figure - a combination of all such rectangles - is a good representation of the distribution of probabilities among elementary events. It's total area would always be 1 and elementary events with larger probability measure would be represented by higher rectangles.

For example, the distribution of probabilities for an ideal dice would be a set of 6 rectangles of equal height of 1/6.

In an example above, when the probability of choosing the number N equals to 1/(2^N), the picture would be different. The rectangle built on a segment from 1 to 2 will have a height 1/2, from 2 to 3 - 1/4, from 3 to 4 - 1/8 etc. Every next rectangle would have a height equal to a half of a previous one, sloping down to 0 as we move to infinity.

Unizor - Creative Mind through Art of Mathematics