Thursday, July 31, 2014

Unizor - Probability - Conditional Probability Definition

Assume we have a set of N elements that represents a sample space of N elementary events with equal chances of occurrence and, therefore, a probability measure of 1/N allocated to each element of this set. Assume further that we are interested in an event represented by a subset A of this set that contains M elements. The condition we impose on this model is the fact that only K elementary events represented by elements of a subset B can actually occur as a result of an experiment. This forces us to redistribute the probability measure to only elements of this subset B, allocating 1/K to each of them and 0 to all other elements outside of a subset B. Since we are interested in the elements of a subset A, we have to choose from all M of them only those that are also a part of a subset A, that is only those from the intersection A∩B, because all other elements have a measure of 0 allocated to them. Let's assume that A∩B contains L elements. Then the conditional probability of the event represented by a subset A under condition represented by a subset B equals to P(A|B)=L/K. But exactly the same result can be obtained by dividing L/N by K/N . Notice now that L/N is P(A∩B) and K/N is P(B). Therefore, we have shown that
P(A|B) = P(A∩B) / P(B)

This equation basically means that a conditional probability of some random event A under a condition of an occurrence of another random event B is a fraction of a measure allocated to the occurred random event B taken by elementary events A∩B common between this condition B and a random event A we are interested in. When events are graphically represented as figures on a plane and probability is interpreted as the area, this equation becomes quite obvious.

This simple equation can be extended to cases of non-equal chances of elementary events and also to infinite sample spaces. Actually, in a rigorously constructed Probability Theory this equation is used as a definition of the conditional probability.

Tuesday, July 29, 2014

Unizor - Probability - The Main Task

The main task of the theory of probabilities is to determine the probabilities of different events, provided that we know all the elementary events that comprise our sample space (the outcomes of a random experiment), that we know the probabilities of these elementary events based on our experience or knowledge, and that we understand the composition of the event we are interested in from the given elementary events.

Thursday, July 24, 2014

Unizor - Probability - Event Logic

Based on a model of the random experiments and their resulting events as sets, elements and subsets with certain measure allocated to them, we can introduce logical operations on events and express them through logical operations on sets and subsets.

Let's analyze an example. Assume, we are interested in a single dice rolling of either a number divisible by 2 or divisible by 3. There are 6 elementary events with a probability of each 1/6 in this experiment. Four results 2, 3, 4, 6 are the event we are interested, so the probability of this event should be 4/6=2/3. Let's approach this from the more formal viewpoint. The results of an experiment are modeled as a set of 6 elements {1,2,3,4,5,6} with a measure of 1/6 allocated to each one. The event "Result is divisible by 2" is represented as a subset {2,4,6} (three elements). The event "Result is divisible by 3" is represented as a subset {3,6} (two elements). As you see, these two subsets have a common element {6}. Simple addition of measures to obtain a probability of occurring one OR another event results in the wrong answer 5/6. But that is because a measure of a union of two subsets is not equal to a sum of their measures in case there are common elements. First, to model an OR condition between two events we should perform a union of subsets that represent these events according to the rules of set operations. This union is {2,3,4,6} (we included {6} only once as a union operation requires). Now a new subset, which is a union of two subsets, has a measure of 4/6=2/3, as it should.

So, as we see, a representation of an OR logical operation between two events can correctly be represented in a formal model as a true union between subsets representing these two events.

Let's examine other logical operations between events. The operation AND is naturally represented as an intersection between subsets that represent our events. Consider the following example. We are interested in getting a number divisible by both 2 and 3, when rolling a single dice. Obviously, there is only one outcome that satisfies this condition - number 6. So, the probability of the occurrence of this event equals to 1/6. From the formal standpoint of the set theory, the event "Result is divisible by 2" is represented as a subset {2,4,6} (three elements), the event "Result is divisible by 3" is represented as a subset {3,6} (two elements). The intersection of these two subsets is an element {6}, that represents an event "Result is divisible by 2" AND "Result is divisible by 3".

The last logical operation we consider is NOT. Its formal representation in the set theory is an operation of complement. A single dice rolling event "Result is not divisible by 3" is comprised from elementary events 1, 2, 4 and 5. Formally, this event is represented as a complement to a representation of an event "Result is divisible by 3", which is a subset {3,6} in a set representing our sample space.

Here is the final thought we can derive from the above analysis of the correspondence between logical operations on random events and the set theory operations on representations of these events as subsets of some set describing the sample space. An entire Theory of Probabilities can be considered as a particular case of the set theory with an additive measure introduced on subsets of a main set that represents the sample space of a random experiment with a condition that the measure of an entire set equals to 1.

Wednesday, July 23, 2014

Unizor - Probability Definition - Equal Chances

Based on the previous philosophical discussion about probability let's attempt to define this concept more rigorously.
By now we should understand the concept of an event (like "getting even number on a dice") as a result of a random experiment (like "rolling the dice"). We have also discussed a concept of an elementary event - the "smallest" in some sense result of a random experiment that cannot be represented as a combination of "smaller" events (like "getting 6 on a single dice rolling"). We also introduced a concept of a sample space as a set of all possible elementary events (like {"getting 1", "getting 2", "getting 3", "getting 4", "getting 5", "getting 6"} as a result of the rolling of a dice). All these concepts were discussed and exemplified in the previous lectures. In all cases that we considered it was possible to identify elementary events in such a way that they seemed to have equal chances of occurrence (flipping symmetrical coins, rolling perfect dice, dealing a randomly shuffled deck of cards among players etc.)

Notice that in all examples that we presented we were dealing with a finite number of elementary events and equal chances of their occurrence. Since all our elementary events had equal chances to occur, we assumed that the frequency of occurrence of each such elementary event would tend to 1/N, where N is the total number of elementary events, increasing to infinity.
Therefore, the probability of each elementary event was reasonably to assume to be 1/N.

To evaluate the probability of different events, we compared the numbers of elementary events that comprise them. Assuming, again, equal chances of each elementary event to occur, the frequency of occurrence of any event seems to be proportional to a number of elementary events that comprise it and, therefore, the probability of any event should be equal to M/N, where M is the number of elementary events that comprise our event and N is the total number of elementary event.

Let's use some abstraction to define these concepts a little more formally, as is customary in mathematics. This will allow to spread the applications of the theory to all other cases that fit our abstraction.

A very simple abstract model of all the concepts above is a finite set of elements with the following properties.
Each element of this set has an associated with it a numerical measure that is equal to 1/N, where N is the number of elements in our set.
Every subset of our set has an associated numerical measure equal to M/N, where M is the number of elements in this subset. This is equivalent to say that the measure of any subset equals to a sum of measures of elements that comprise it.
From this we can easily derive that the measure of an empty subset equals to 0, the measure of a full subset that coincides with the whole set equals to 1, a measure of a union of two subsets that have no common elements equals to a sum of their measures (additive property of our measure).

Now let's define the terminology to bridge the gap between the abstraction above and all we know about probability.
The set we introduced above we will call a "sample space".
We will call the elements of this set "elementary events" and the measure associated with each we will call its "probability" that is equal to 1/N, where N is the number of elements (which we called elementary events) in our set (which we called a sample space).
We further call any subset of that set (that is a subset of a sample space) an "event". It contains certain number of elements (that is, elementary events) and its "probability" is its measure we have defined above as M/N, where M is the number of elements in this subset and N is the total number of elements in the set.

Summarizing our abstraction, we model the results of any random experiment as
i) a finite set
ii) with an additive measure defined for each element and each subset,
iii) with the measure of an entire set equal to 1,
iv) with equal measures associated with each element,
v) with, consequently, a measure of each element equal to 1/N,
v) with, consequently, a measure of each subset equal to a sum of measures of elements that comprise it.

From now on we assume exactly these properties when we discuss sample space (sets), events (subsets of this set), elementary events (elements of this set), probability (additive measure on this set).

We would like to mention that the above abstraction works only for random experiments with finite number of symmetrical results having equal chances to occur.
The complete Theory of Probabilities deals also with experiments having infinite number of results and more abstract definitions of measure, but these are outside the scope of this course.

Friday, July 11, 2014

Unizor - Probability Examples - Dice Games

Craps
In the game of craps a player rolls two dice. Two numbers on top of these dice are summed up. The sums that are equal to 7 or 11 represent a win for some players.
What is the probability of having 7 or 11 as a sum of the numbers on two dice?

Solution
First of all, we have to establish the set of elementary events (or the sample space, as it is called) as the equally probable symmetrical results of this experiment. Obviously, all the pairs of numbers from 1 to 6 can be on top of two dice with equal chances. Therefore, our set contains 6·6=36 elementary events, each of which has the probability of 1/36.
Now we have to determine the number of those pairs of numbers from 1 to 6 that in sum equal to 7 or 11.
To get the sum of 7 on two dice we can have combinations {1,6}, {2,5}, {3,4}, {4,3}, {5,2} and {6,1} - 6 different combinations.
To get the sum of 11 on two dice we can have combinations {5,6} and {6,5} - 2 different combinations.
Therefore, there are 6+2=8 different elementary events (pairs of numbers from 1 to 6) that make up an event of having 7 or 11 as a sum of the numbers on two dice. Since the probability of each elementary event is 1/36, the probability of the event of having 7 or 11 equals to 8/36=2/9.

Yahtzee
This game is played with five dice and different combinations have different seniority.
What is the probability of having a combination of three dice having the same number on top and the other two dice having the same number on top, which is different from the first number?

Solution
The sample space in this case is a sequence of five numbers from 1 to 6 each. Therefore, there are 6^5 different elementary events, each having a probability of 1/6^5.
Now let's count the "good" elementary events - those that have three equal numbers and two other equal numbers among five numbers in a sequence.
First, count all the distributions of five dice into two groups of three and two dice. This can be done in C(5,3)=10 ways. Then pick a number for the triplet of dice, one of 6. Then we have to pick the number for the pair of dice out of remaining 5 numbers.
The result is a multiplication of the numbers of choices:
10·6·5 = 300
Therefore, the probability of the Full House in the game of Yahtzee equals to
300/6^5 ~= 0.03858 ~= 4%.

First Game of Chevalier de Mere
Chevalier de Mere lived in XVII century and was a gambler. Here is a dice game he suggested.
He rolls a dice four times. If number 1 comes up at least ones, he wins. If not, he loses.
What's the probability to win for him?

Solution
The sample space in this case is all sequences of four numbers from 1 to 6 each. There are 6^4 different statistically equivalent elementary events in this space, each having a probability of 1/6^4.
Now let's count the "good" elementary events - those that have at least one number 1 among four numbers in a sequence. We will count the sequences that do not contain the number 1 and then subtract it from the total number of elementary events. So, the total number of all sequences equals to 6^4, as indicated above. The count of sequences without number 1, similarly, is 5^4, since we can use only five digits from 2 to 6.
Therefore, the number of "good" elementary events equals to
6^4 − 5^4.
The probability of winning, therefore, equals to
(6^4−5^4)/6^4 = 1−(5/6)^4 ≅ 0.5177
This probability is greater than 50% and, therefore, chevalier de Mere more often won than lost this game.

Second Game
of Chevalier de Mere
After winning some money in the game of hitting at least once number 1 in four rolls of a dice and losing many companions because of that chevalier de Mere decided to invent another game.
He rolls two dice 24 times. If number 1 comes up simultaneously on both dice at least once in this series, he wins. If not, he loses.
Unfortunately for him, he has lost a lot in this game.
What's the probability to win for him?

Solution
The sample space in this case is all sequences of 24 pairs of numbers from 1 to 6. There are 36^24 different elementary events in this space, each having a probability of 1/36^24.
Now let's count the "good" elementary events - those that have at least one pair {1,1} among 24 pairs in a sequence. We will count the sequences that do not contain the pairs {1,1} and then subtract it from the total number of elementary events. So, the total number of all sequences of pairs equals to 36^24, as indicated above. The count of sequences without pairs {1,1}, similarly, is 35^24, since we can use any other pair of numbers from 1 to 6 but {1,1} on each of 24 positions.
Therefore, the number of "good" elementary events equals to
36^24 − 35^24.
The probability of winning equals to
(36^24−35^24)/36^24 = 1−(35/36)^24 ≅ 0.4914
Its less than 50% and, therefore, chevalier de Mere more often lost than won this game.
After losing a lot of money he requested a help of a famous mathematician Blaise Pascal, which was the beginning of the Theory of Probabilities as a mathematical subject.

Unizor - Probability and Prediction

Yet another viewpoint to probability is its direct connection to prediction.
Why do we need probability after all? To make a prediction.
Without using the probabilities for predicting of occurrence or non-occurrence of certain events we would not need the probabilities. So, ability to predict is the main purpose of introducing probabilities.

So, from the viewpoint of prediction, the statement that "the probability of a dice to have number 3 on top is equal to 1/6" means that in the course of many random independent experiments of throwing a dice we expect the number 3 to be on top, approximately, in 1/6 cases. It does not mean that in every series of 6 experiments the number 3 will be on top exactly once. It does not mean that in every series of 600 experiments the number 3 will on top exactly 100 times. What it means is that if we conduct N experiments and the number of times the number 3 is on top equals to K(N) then a fraction P(N) = K(N) / N tends to 1/6 as N increases to infinity.

Can the number 3 be on top of a dice 10 times in a row? Yes. It's still possible. But this is a different experiment with different elementary events and different probabilities. In this case a single experiment is throwing a dice 10 times in a row as a series. An elementary event is a series of 10 numbers (each from 1 to 6) with a probability of 1/6^10 for each such series. That very small value is the probability of having number 3 on top 10 times or number 2 on top 10 times or any other series of 10 numbers, 1 to 6 each.

But, if we want to approximately evaluate the count of times when number 3 is on top among 10 throwings, we have to add all the probabilities of elementary events above that contain no number 3's, 1 number 3, 2 number 3's etc. up to 10 number 3's to know the probability of number 3 for each count. Let's evaluate it. For number 3 to be on top K times out of 10 we have to pick K positions out of 10 (there are C(10,K) ways to do it). These K positions must be occupied by number 3 (so, it's only one choice to have it), while the other 10−K positions can be occupied by any other number (that is, 1, 2, 4, 5 and 6) - 5 choices for each position, which makes 5^(10−K) combinations. Therefore, the number of combinations of 10 numbers from 1 to 6 each with K numbers 3 among them equals to C(10,K)·5^(10−K). That is the number of elementary events in our experiment with 10 rolls that contain number 3 exactly K times.
So, the probability of having number 3 exactly K times equals to a probability of a single elementary event multiplied by this number, that is
P(K)=C(10,K)·5^(10−K)/6^10

The approximate values of these probabilities for different K are:
P(0)=0.16; P(1)=0.32; P(2)=0.29; P(3)=0.15; P(4)=0.05; p(5)=0.01
and for greater K the probabilities are less than 0.01.

So, if we want to predict the results of 10 random experiments of throwing a dice, we can say that "it's quite probable to have the number 3 on top somewhere from 0 to 4 times from each series of 10 experiments". Narrower prediction (like from 1 to 2 times) has lower probability.
Of course, "quite probable" has its numerical equivalent as we tried to demonstrate above. That's exactly what Theory of Probabilities is studying.

Tuesday, July 8, 2014

Probability and Frequency

In the Theory of Probabilities we deal with a concept of a random experiment which results in occurrence or not occurrence of certain events.

For the purposes of this introduction into Theory of Probabilities we assume that all these experiments are independent in a sense that the result of one experiment does not affect the result of any other.

We also assume that these experiments are repeatable, so we can repeat the same experiment under similar conditions with, possibly, different results.

For example, consider a random experiment of shuffling and dealing a standard deck of 52 cards among four players, 13 cards each. An event we are interested in is to have four aces distributed to four players in such a way that each player has one ace. This event might occur or not occur as a result of our experiment.

Another important concept is elementary events from the combinations of which any other event can be constructed. In the example above each individual distribution of 52 cards among four players, 13 cards each, constitutes such an elementary event and we can construct the event we are interested in as a combination of particular distributions of certain cards to certain players. As we calculated while discussing Combinatorics, the number of these elementary events (that is, particular distributions of 52 cards among 4 players, 13 cards each) is 52!/(13!)4.

What's important about this and all other experiments we will be discussing is our ability to repeat this exact experiment any number of times, so we can count how many times the event in question occurred and how many times it did not occur, calculating its frequency of occurrence as a ratio between the number of experiments the event occurred to a total number of experiments (sometimes expressed as a percentage).

Presumably, if we repeat our experiment more and more times, the frequency of occurrence of a certain event will be more and more close to a certain limit value that can be called the probability of this event. This is not a precise mathematical definition of probability, but a rather philosophical explanation. It obviously depends on existing of this limit value, which we have no means to prove using just the above explanatory approach, but rather assume it. Precise mathematical foundations of this approach provide a solid base for this assumption.

A very important characteristic of our experiments is statistical similarity or inner symmetry of elementary events as their results. These elementary events have equal chances to occur and, by combining them in some way, we can construct other events we are interested in.

For example, since there are C(49,6) statistically equivalent results of choosing 6 winning numbers out of 49 in lottery, we can safely assume that the frequency of occurrence of one or any other particular set of 6 winning numbers will tend to 1/C(49,6) as the number of experiments (randomly picking 6 numbers out of 49) increases to infinity. Therefore, the probability of any one particular set of 6 winning numbers equals to 1/C(49,6).

Notice that any event we are interested in (like each player has exactly one ace or the dice has an even number on top, or you guessed two out of six winning lottery numbers) is a combination of certain number of equally probable elementary events symmetrical in the course of the experiment.

As you see, two characteristics are very important to understand the probability from the point of view of frequency - repeatability of the experiments under the same conditions and inner symmetry of the elementary events occurring as a result of the experiment and from which we can "construct" any event we are interested in.