## Friday, December 26, 2014

### Unizor - Limits - Number e as a Limit

We have introduced a number e, an extremely important number in calculus and analysis, as a base of an exponential function y=a^x with a steepness of 1 at the argument value x=0. We have also indicated without sufficient rigorousness that this number is the limit of an infinite sequence:
e = lim [n→∞] (1+1/n)^n.
In this lecture we will rigorously prove that the above limit does exist and can be used as a definition of a number e.

In the lecture Problems 3 of this chapter about limits we have proved that an infinite monotonously increasing sequence that is bounded from above has a limit. This theorem is a foundation of this lecture. We will prove that a sequence (1+1/n)^n is monotonously increasing and has an upper bound. Therefore, it has a limit as n→∞, and that limit is a definition of a number e.

The fact that our sequence (1+1/n)^n is bounded from above has already been proven in the previous lecture about a function F(x)=e^x where we have proved that for any natural number n
2 ≤ (1+1/n)n ≤ 3
So, all we have to prove now is the monotonic character of this sequence.

So, let's prove that for all n greater than 1
(1+1/n)^n ≥ [1+1/(n−1)]^(n−1)
Direct method to prove this is to use the Binomial formula by Newton.
The expression on the left is a sum of n positive terms, the ith of them being
n!/[(n−i)!·i!·n^i]
The expression on the right is a sum of n−1 positive terms, the i-th of them being
(n−1)!/[(n−i−1)!·i!·(n−1)^i]
The expression on the left has one more positive term in the sum. Now we will prove that every i-th term of the left expression is greater than the corresponding i-th term of the right expression.
Indeed, we can get rid of i! since it is the same for both left and right common terms.
What remains from the left term can be written as
n·(n−1)·...·(n−i+1)/n^i =
= 1·[1−1/n]·...·[1−(i−1)/n]
What remains from the right term can be written as
(n−1)·(n−2)·...·(n−i)/(n−1)^i =
= 1·[1−1/(n−1)]·...·[1−(i−1)/(n−1)]
The term on the left is greater because each member of a product in the left term is greater than corresponding member of a product in the right term since
1-k/n ≥ 1-k/(n−1)
for each k from 1 to i−1.

This completes the proof that a sequence (1+1/n)^n is monotonically increasing. Together with the fact that it is less than 3 for any positive integer number n, proven in the previous lecture about a function ex (that is, 3 is the upper bound for this sequence) it proves that a sequence (1+1/n)^n tends to a limit, which is some real number between 2 and 3. This number is the number e introduced in the previous lecture from a different angle, as a base of an exponential function that has a steepness of 1 at point x=0.

Based on this, let's mention one more equality related to limits.
Let x be any real number. Consider an exponential function F(x)=e^x.
We can state now that
lim[n→∞](1+x/n)^n = e^x

Intuitively, it's obvious since
(1+x/n)^n = [(1+x/n)^(n/x)]^ x
When n goes to infinity, n/x goes to infinity as well and the expression in square brackets tends to e, so the whole expression tents to e^x. Although it's not an absolutely rigorous proof, this consideration should suffice for now.

## Friday, December 19, 2014

### Unizor - Trigonometry - Exponentiation of Complex Numbers

Continuing on trigonometric representation of complex numbers, let's research how to raise a complex number in trigonometric form to some power (the process called exponentiation). It's important to understand that the rules we will be dealing with are not theorems that we prove, but definitions of new operations. We just prove that these definitions are reasonable in a sense that they preserve important properties of new operations, similar to properties we already know for real numbers, like
a^0 = 1 and
a^(P+Q) = a^P·a^Q and
a^(-P) = 1/(a^P) and
a^(P·Q) = (a^P)^Q

Since we know how to multiply two complex numbers in trigonometric form, we can easily derive a formula for raising a complex number to a positive integer power since this process is just a multiplication by itself certain number of times.
Indeed, using the property of multiplication addressed in the previous lecture,
[r·cos(φ)+i·r·sin(φ)]^2 =
= [r·cos(φ)+i·r·sin(φ)] ·
· [r·cos(φ)+i·r·sin(φ)] =
=r·r·cos(φ+φ)+i·r·r·sin(φ+φ)=
= r^2·cos(2·φ)+i·r^2·sin(2·φ) =
= r^2·[cos(2·φ)+i·sin(2·φ)]

Similarly, by induction, for any natural N,
[r·cos(φ)+i·r·sin(φ)]^N =
= r^N·cos(N·φ)+i·r^N·sin(N·φ) =
= r^N·[cos(N·φ)+i·sin(N·φ)]

Expanding this to negative integer numbers, we will use the main property of exponentiation we would certainly want to preserve:
a^(M+N) = a^M · a^N
and, derived from it for N=−M
1 = a^0 = a^(M−M) = a^[M+(−M)] = a^M · a^(−M)
from which follows:
a^(−M) = 1 / (a^M)

Therefore, it's reasonable to define an exponentiation of complex numbers with negative integer power −N (where N is a positive integer) as
[r·cos(φ)+i·r·sin(φ)]^(−N) =
= 1/[r^N·cos(N·φ)+i·r^N·sin(N·φ)]

Now it's easy to notice that for any argument (phase) ψ of a complex number in polar form
[cos(ψ)+i·sin(ψ)]·[cos(−ψ)+i·sin(−ψ)] =
= cos(ψ−ψ)+i·sin(ψ−ψ) =
= cos(0) + i·sin(0) = 1
Therefore,
1 / [cos(ψ)+i·sin(ψ)] =
= cos(−ψ)+i·sin(−ψ)

Using the above equality for exponentiation to a negative integer power and similar equality for real numbers 1/(r^N)=r^(−N), we derive
[r·cos(φ)+i·r·sin(φ)]^(−N) =
= r^(−N)·[cos(−N·φ)+i·sin(−N·φ)]
Notice the universality of the formula for natural exponent (a consequence of plain multiplication)
{r·[cos(φ)+i·sin(φ)]}^N =
= r^N·[cos(N·φ)+i·sin(N·φ)]
which has exactly the same form if N is a negative integer.

Let's proceed to rational exponent of a complex number in trigonometric form.
The simple approach to define exponentiation with rational exponent is to use the above formula for integer N.
Let's define a new absolute value (modulus) q=r^N and a new argument ψ=N·φ.
Then r=q^(1/N), φ=ψ/N and the formula would look like this:
{q^(1/N)·[cos(ψ/N)+i·sin(ψ/N)]}^N =
= q·[cos(ψ)+i·sin(ψ)]
This is a justification for the general definition of raising a complex number to a power equal to a rational number 1/N (where N is an integer):
{q·[cos(ψ)+i·sin(ψ)]}1/N =
= q^(1/N)·[cos(ψ/N)+i·sin(ψ/N)]
Notice, again, the general format of the formula is exactly the same as for positive integer exponents, that is the absolute value is raised to a power and an argument is multiplied by this power.

From here it's just one little step to derive a formula for any rational exponent. We just combine the rules for z^M and z^(1/N) into one rule for z^(M/N):
{r·[cos(φ)+i·sin(φ)]}^(M/N) =
= r^(M/N)·[cos((M/N)·φ)+i·sin((M/N)·φ)]

Imagine how cumbersome the formula for raising a complex number into a rational power would look in the traditional representation of a complex number as z=a+b·i.

To make this lecture complete, it's necessary to say a few words about irrational exponents. Here, as with raising real numbers to irrational power, exact rigorous mathematics is quite involved. Sufficient to say that the theory of limits is used and, as irrational numbers can be considered as limits of sequences of rational numbers, the definition of irrational exponentiation is based on the same limit principle and is defined as a limit of corresponding rational exponentiation.

Example
Let's see what (−1)^1/2 is if we use the trigonometry.
(−1)^1/2 = [cos(π)+i·sin(π)]^1/2 = cos(π/2)+i·sin(π/2) = 0+i·1 = i
as expected, because this is a definition of a complex i.

## Saturday, December 13, 2014

### Unizor - Limits - F(x)=e^x

In the chapter describing exponential functions we introduced the concept of steepness β of an exponential function F(x)=a^x at point x=0 as a limit of a ratio between increment of a function between points x=0 and x=1/n (that is F(1/n)−F(0)=a^(1/n)−a^0) and a corresponding increment of an argument (that is 1/n−0) when n→∞, so a point x=1/n is getting closer and closer to a point x=0.
After a trivial simplification (since a^0=1) the steepness of a function F(x)=a^x at point x=0 is defined as
β = lim[n→∞](a^(1/n)−1)/(1/n) = lim[n→∞]n·(a^(1/n)−1)

At this point we would like you to recall the material presented in the lecture Algebra - Exponential Functions - Problems 2 about a steepness of exponential functions 2^x and 3^x at a point x=0. Refer to that lecture for refreshing.

We have proved there that the steepness of a function 2^x at point x=0 is less than 1, while the steepness of a function 3^x at point x=0 is greater than 1.
Assuming continuity of steepness as it depends on the base of the exponential function, it is reasonable to expect that somewhere between 2 and 3 there is a value of a base of an exponential function with the steepness equaled exactly to 1. This value of a base is denoted as e in mathematics and its approximate value is 2.71.
Therefore, we can write:
lim[n→∞] (e^(1/n)−1)/(1/n) = 1

Slightly deviating from the rigorousness we used to apply to our statements, we can derive from the last equality that there is an approximation
(e^(1/n)−1)/(1/n) ≅ 1
which becomes better and better as n→∞.
Or, resolving this for e,
e^(1/n)−1 ≅ 1/n or
e^(1/n) ≅ 1+1/n or
e ≅ (1+1/n)^n
Returning to exact equality, we can write the value of e as the following limit:
e = lim[n→∞] (1+1/n)^n

Summarizing the above logic, we have determined a specific value of the base of the exponential function with the steepness of 1 at point x=0. This value is designated a symbol e and is equal to
e = lim[n→∞] (1+1/n)^n
Thus, we have defined an exponential function
F(x) = e^x
Obviously, all properties of exponential functions are preserved in this particular function, including but not limited to:
e^0 = 1
e^1 = e
e^(x+y) = e^x·e^y
e^(x·y) = (e^x)^y

Another interesting fact is that steepness, as we introduced it for function F(x)=e^x at point x=0, can be defined at any other point using exactly the same methodology.
Consider any point x=x0. Now step forward from this point by a value 1/n to a point x=x0+1/n. Construct a ratio between the difference in the values of a function in these two points and the difference in the values of an argument:
β = [F(x0+1/n) − F(x0)] / (1/n)
As n→∞, the limit of this ratio characterizes the steepness of a function F(x) at point x0.

In case of the function we have introduced in this lecture, F(x)=e^x, the steepness at any point x0 equals to a limit of the following expression, as n→∞:
[e^(x0+1/n) − e^x0] / (1/n) =
= [e^(x0)·e^(1/n) − e^(x0)] / (1/n) =
= e^(x0)·[e^(1/n) − 1] / (1/n)
As n→∞, e^(x0) remains a constant while the expression
[e^(1/n) − 1] / (1/n), which represents the steepness of function ex at point x=0, has a limit of 1.
Therefore, the steepness of function e^x at any point x0 equals to a value of this function at this same point, that is e^(x0). This is quite a remarkable and unique property of function F(x)=e^x.

## Thursday, December 11, 2014

### Unizor - Trigonometry - Representation of Complex Numbers

We strongly advise students to refresh the knowledge of Complex Numbers. The lectures dedicated to these numbers are presented in the Algebra part of this course. Special attention should be dedicated to graphical representation of complex numbers.

Here is the main concept of the graphical representation of complex numbers.
We can always consider a complex number z = a + b·i as a pair of two real numbers (a, b) and each such pair (i.e. each complex number) we can put into a correspondence with a point on a coordinate plane with Cartesian coordinates (a, b).

But every point on a plane can be identified not only by its Cartesian coordinates, but a pair of polar coordinates - the distance from the origin r and a polar angle φ.
The conversion from the polar coordinates into Cartesian is a simple trigonometric identities:
a = r·cos(φ)
b = r·sin(φ)
The conversion from Cartesian coordinates to polar is also fully defined by the following equalities:
r = √(a^2+b^2)
cos(φ) = a/√(a^2+b^2)
sin(φ) = b/√(a^2+b^2)

Hence, each complex number a+b·i can be represented in the polar form as
r·cos(φ)+r·sin(φ)·i = r·[cos(φ)+i·sin(φ)]
In this form the distance from the origin r is called magnitude or modulus, or absolute value of a complex number, while the polar angle φ is called an argument or a phase.

Incidentally, real numbers can be considered as a subset of complex numbers with an imaginary component (that is, coefficient at i) equaled to zero. In polar form it means that the argument equals to 0 for positive real numbers or π for negative real numbers. In both cases the imaginary part equals to 0 since sin(0)=sin(π)=0.

While it's easy to add two complex numbers in their traditional form, their product looks much more complicated.

Here is a sum:
(a1+b1·i) + (a2+b2·i) =
= (a1+a2) + (b1+b2)·i

And here is a product:
(a1+b1·i) · (a2+b2·i) =
=a1·a2+a1·b2·i+a2·b1·i+b1·b2·i^2=
= (a1·a2−b1·b2)+(a1·b2+a2·b1)·i
because i^2 = −1

The situation with product is much simpler in the polar form of representation of complex numbers:
r1·[cos(φ1)+i·sin(φ1)] ·
r2·[cos(φ2)+i·sin(φ2)] =
= r1·r2·[cos(φ1)·cos(φ2) +
+ i·cos(φ1)·sin(φ2) +
+ i·sin(φ1)·cos(φ2) +
+ i^2·sin(φ1)·sin(φ2)] =
= r1·r2·{[cos(φ1)·cos(φ2) − sin(φ1)·sin(φ2)] +
+ i·[cos(φ1)·sin(φ2) + sin(φ1)·cos(φ2)]} =
= r1·r2·[cos(φ1+φ2)+sin(φ1+φ2)]
So, the magnitudes are multiplied but arguments are added. The final formula is relatively simple.

There is a very clear geometric meaning of multiplication of complex numbers in polar form.
Consider only complex numbers that have a magnitude of 1, that is all numbers of a form
cos(φ)+i·sin(φ).
They all lie on a unit circle around the origin of coordinates.

When one such number with an argument φ1 is multiplied by another with an argument φ2, the result will be a new complex number with a magnitude of 1, still on the same unit circle, and an argument equal to φ1+φ2.
It means that algebraic operation of multiplication is, geometrically, a rotation.
Remarkable, is not it!

If we consider multiplication by a complex number with a magnitude not equal to 1, geometrically, we still deal with a rotation, but also deal with a stretching by a factor equal to a magnitude of a multiplier.
Multiplication by a positive real number is only a stretching since positive real numbers, considered as a subset of complex numbers, have argument equaled to zero in polar form and, therefore, there is no rotation.
Multiplication by a negative real number is a stretching with a change in the direction to an opposite since negative real numbers, considered as a subset of complex numbers, have argument equaled to π in polar form, which, if added, changes the direction to an opposite.

Multiplication by i is a rotation by π/2=90° because, in polar form,
i = 0+1·i = cos(π/2)+i·sin(π/2),
that is, i is a complex number with magnitude 1 and argument π/2 in polar form.
Therefore, multiplication by i is a rotation by an angle π/2.
Incidentally, this is obvious in a traditional representation of complex numbers since
(a+b·i)·i = −b+a·i
and a segment connecting the origin of coordinates and a point (a,b) should be rotated by an angle π/2 around the origin of coordinates to coincide with a segment from the origin of coordinates to a point (−b,a).

## Monday, November 24, 2014

### Unizor - Probability - Normal Distribution - Using Sigma

Consider a game of winning against a house a unit of currency 1 bitcoin or losing to a house 1 bitcoin with equal probabilities of 1/2.
Now you decided to play a series of 100 games. How much money do you have to have in reserve to be able to pay the balance in case you are at loss after this series of games?

The simplest answer is to have 100 bitcoins in reserve to be able to pay for 100 losses in a row. In theory, if you want to assure with the probability of 100% that you will be able to pay, this is the only answer. But, at the same time, you might think that it would be good enough to assure the ability to pay with a relatively high probability, accepting a small risk to lose your good standing with the house. How much do you have to have in reserve then? Could you substantially reduce the amount of money required then?

The normal distribution and sigma limits can be of help in this case.

An individual game can be modeled as a Bernoulli random variable ξ taking values +1 and −1 with equal probabilities of 1/2.
The result of a series of 100 games is a sum of 100 such variables:
η = ξ1+ξ2+...+ξ100.
All members of this sum are independent and identically distributed according to Bernoulli distribution taking values +1 and −1 with equal probabilities of 1/2.

To evaluate the number of bitcoins you have to have to be able to pay off your losses with a relatively high probability, you have to establish your risk limits. For instance, you decide that if the probability of losing greater than S bitcoins is less than 0.025, it's good enough for you.
To find S (positive amount of bitcoins to have in reserve), we have to satisfy the inequality P(η ≤ −S) ≤ 0.025.

This is not a simple task. You have to find the probabilities for K=−100, K=−98, K=−96 etc. Summarize them one by one until their sum exceeds 0.025. The last value of K will be the maximum loss you can pay, ignoring all greater losses since their accumulated probability does not exceed 0.025.

As you see, this process is tedious, cumbersome, long, time consuming etc.

There is better way!

As we know, the distribution of a sum of many independent identically distributed random variables resembles the normal distribution. Which one? The one with an expectation equal to a sum of expectations of each component of a sum and a variance equal to a sum of their variances.

In our case the expectation of each Bernoulli random variable equals to
E(ξ)=(1)·(1/2)+(−1)·(1/2)=0
Its variance is
Var(ξ) =
=(1−0)2·(1/2)+(−1−0)2·(1/2)=1
Therefore, expectation and variance of a sum of 100 such variables are:
E(η) = 100·E(ξ) = 0
Var(η) = 100·Var(ξ) = 100

Let's assume now that random variable η is truly normal. Obviously, this assumption must be quantified, but we are looking for approximate values, so, for our purposes, it's quite valid assumption. Then we can use the sigma limits for a normal random variable. Knowing variance, we calculate the standard deviation as a square root from variance:
σ(η) = 10
And now we are ready to evaluate the probability of the sum of 100 results of the Bernoulli experiments based on sigma limit.

Recall the sigma limits for a normal random variable with expectation μ and standard deviation σ:
1. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−σ, μ+σ] (the narrowest interval of these three) approximately equals to 0.6827.

2. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−2σ, μ+2σ] (the wider interval) approximately equals to 0.9545.

3. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−3σ, μ+3σ] (the widest interval of these three) approximately equals to 0.9973.

Consider the 2σ limit and apply it for our case of μ=0 and σ=10.
Since the probability of a random variable to be in the interval 2σ, that is in the interval [−20,20], equals to 0.9545, the probability to fall outside of this interval (less than −20 or greater than 20) equals to 0.0455. Since normal distribution is symmetrical, the probability to be less than −20 equals to the probability to be greater than 20 and equals to half of that value, that is, approximately, 0.0227.

Therefore, we can say that with the probability of about 0.0227 we will not lose more than 20 bitcoins. As you see, the calculations are simple and, accepting a little risk (with probability 0.0227), we can reduce the required amount of reserve from 100 bitcoins for non-risk strategy, to only 20 bitcoins.

## Friday, November 21, 2014

### Unizor - Probability - Normal Distribution - Sigma Limits

As we know, the normal distribution of a random variable (or the distribution of probabilities for a normal random variable) is defined by two parameters:
expectation (or mean) μ and
standard deviation σ.

The expectation defines the center of a bell curve that represents the distribution of probabilities.
The standard deviation defines the steepness of this curve around this center - smaller σ corresponds to a steeper central part of a curve, which means that values around μ are more probable.

Our task is to evaluate the probabilities of the normal variable to take values within certain interval around its mean μ based on the value of its standard deviation σ.

Consider we have these two parameters, μ and σ, given and, based on their values, we have constructed a bell curve that represents the distribution of probabilities of a normal variable with these parameters. Let's choose some positive constant d and mark three points, A, M and B, on the X-axis with coordinates, correspondingly, μ−d, μ and μ+d.
Point M(μ) is at the center of our bell curve.
Points A(μ−d) and B(μ+d) are on both sides from a center on equal distance d from it.

The area under the entire bell curve equals to 1 and represents the probability of our normal random variable to take any value.
The area under the bell curve restricted by a point A on the left and a point B on the right represents the probability of our random variable to take value in the interval AB.
We have specifically chosen points A and B symmetrical relatively to a midpoint M because the bell curve has this symmetry.

It is obvious that the wider interval AB is - the greater the probability of our random variable to take a value within this interval. Since the area under the bell curve restricted by points A and B around the center M depends only on its width (defined by the d constant) and the steepness of a curve, let's measure the width using the same parameter that defines the steepness, the standard deviation σ. This will allow us to evaluate probabilities of a normal random variable to take values within certain interval based only on one parameter - its standard deviation σ.

Traditionally, there are three different intervals around the mean value μ considered to evaluate the values of normal random variable:
d=σ, d=2σ and d=3σ.
Let's quantify them all.

1. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−σ, μ+σ] (the narrowest interval of these three) approximately equals to 0.6827.

2. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−2σ, μ+2σ] (the wider interval) approximately equals to 0.9545.

3. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−3σ, μ+3σ] (the widest interval of these three) approximately equals to 0.9973.

As you see, the value of a normal variable can be predicted with the greatest probability when choose the widest interval of the three mentioned - the 3σ-interval around its mean. The value will fall into this interval with a very high probability.

Narrower 2σ-interval still maintains a relatively high probability to have a value of our random variable fallen into it.

The narrowest σ-interval has this probability not much higher than 0.5, which makes the prediction for the value of our random variable to fall into it not very reliable.

## Tuesday, November 18, 2014

### Unizor - Probability - Random Variables - Problems 3

Problem 3.1.
Given a random variable. Then a new random variable is formed as a product of a constant by this random variable.
Prove that an expectation (mean) of a product of a constant by a random variable equals to a product of that constant by an expectation of that random variable:
E(a·ξ) = a·E(ξ)

Proof
Recall that an expectation of a random variable is a weighted average of its values with corresponding probabilities as weights.
Assume that random variable ξ takes values x1, x2,...xN with probabilities, correspondingly, p1, p2,...pN.
Then
E(ξ) = x1·p1 + x2·p2 +...+ xN·pN
Random variable a·ξ takes values a·x1, a·x2,...a·xN with probabilities, correspondingly, p1, p2,...pN.
Then
E(a·ξ) = a·x1·p1+a·x2·p2+...+a·xN·pN =
= a·(x1·p1 + x2·p2 +...+ xN·pN) = a·E(ξ)

Problem 3.2.
Given a random variable. Then a new random variable is formed as a product of a constant by this random variable.
Prove that a variance of a product of a constant by a random variable equals to a product of a square of that constant by a variance of that random variable:
Var(a·ξ) = a^2·Var(ξ)

Proof
Recall that a variance of a random variable is an expectation of a square of deviation of this random variable from its expected value (mean). As a formula, it looks like this:
Var(ξ) = E{[(ξ−E(ξ)]^2}
Assume that random variable ξ takes values x1, x2,...xN with probabilities, correspondingly, p1, p2,...pN.
Let the expectation (mean) of this random variable be E(ξ)=μ.
Then
Var(ξ) = (x1−μ)^2·p1 + (x2−μ)^2·p2 +...+ (xN−μ)^2·pN
Random variable a·ξ takes values a·x1, a·x2,...a·xN with probabilities, correspondingly, p1, p2,...pN.
As we know from a previous problem,
E(a·ξ) = a·E(ξ) = a·μ
Then
Var(a·ξ) = (a·x1−a·μ)^2·p1 + (a·x2−a·μ)^2·p2+...+ (a·xN−a·μ)^2·pN
= a^2·(x1−μ)^2·p1 + a^2·(x2−μ)^2·p2 +...+ a^2·(xN−μ)^2·pN =
= a^2·Var(ξ)

Problem 3.3.
Given a random variable. Then a new random variable is formed as a product of a constant by this random variable.
Prove that a standard deviation of a product of a constant by a random variable equals to a product of an absolute value of that constant by a standard deviation of that random variable:
σ(a·ξ) = |a|·σ(ξ)

Proof
Recall that a standard deviation of a random variable is a (non-negative) square root of its variance.
Therefore,
σ(a·ξ) = √Var(a·ξ) = √a2·Var(ξ) = |a|·√Var(ξ) = |a|·σ(ξ)
Notice the absolute value around a constant a. Even if this constant is negative, a standard deviation is always non-negative, as is a variance of a random variable.

Problem 3.4.
Given N independent identically distributed random variables ξ1, ξ2,...ξN, each having the expectation (mean) μ and the standard deviation σ.
Calculate the expectation, variance and standard deviation of their average
η = (ξ1 + ξ2 +...+ ξN)/N

E(η) = μ
Var(η) = σ2/N
StdDev(η) = σ/√N

Hint
Use the additive property of an expectation relative to a sum of any random variables and an additive property of a variance relative to a sum of independent random variables.

IMPORTANT CONSEQUENCE

Averaging of independent identically distributed random variables produces a new random variable with the same mean (expectation) as the mean of original random variables, but the standard deviation of that average from its mean is decreasing by a factor of √N as the number N of random variables participating in the averaging in increasing.

Therefore, the process of averaging produces more and more precise estimate of the expectation of a random variable as the number of random variable grows. This is a basis of most statistical calculations.

For example, if one person measures a length of some object, there is always an error of his measuring. His result is a true length plus some error introduced by his measurement (positive or negative). So, we can consider his measure as a random variable with certain mean - a true length of an object and a random error with some standard deviation that depends on his measurement tool and accuracy. Thus, if measure with a ruler the length of a table with a true length of 1 meter, we can say that the error might be no greater then 1 centimeter, and our results will be from 99 to 101 centimeters.

But if this person measures the length of this object 100 times and the results of these measurements are averaged, the final result will be much closer to the real value of the length of the object since the mean of this average is the same as the mean of every measurement, that is the true length of an object, but the standard deviation around this true length will be √100=10 times smaller than that of each measurement. That assures the precision of our evaluation of the length of the table to be within 1 millimeter from the real length.

### Unizor - Probability - Normal Distribution - Normal is the Limit

Let's start with formulating again the Central Limit Theorem of the theory of probabilities that illustrates the importance of the Normal distribution of probabilities.

In non-rigorous terms, the Central Limit Theorem states that, given certain conditions, the average of a large number of random variables behaves approximately like a normal random variable.

One of the simplest sufficient conditions, for example, is the requirement about random variables participating in averaging to be independent, identically distributed with finite expectation and variance.
Throughout the history of development of the theory of probabilities the Central Limit Theorem was proven for weaker and weaker conditions. Even if individual variables are not completely independent, even if their probability distributions are not the same, the theorem can still be proven. Arguably, the history of the development of the theory of probabilities is the history of proving the Central Limit Theorem under weaker and weaker conditions.

Rigorous proof of this theorem, even in the simplest case of averaging independent identically distributed random variables, is outside of the scope of this course. However, the illustrative examples are always possible.

Consider a sequence of N independent Bernoulli experiments with a probability of success equal to 1/2. Their results are the random variables ξi, each taking two values, 0 and 1, with equal probability of 1/2 (index i is in the range from 1 to N).

Now consider their sum
η = ξ1+ξ2+...+ξN

From the lecture about Bernoulli distribution we know that random variable η can take any value from 0 to N with the probability to take a value K (K is in the range from 0 to N) equal to
P(η=K) = CNK·pK·qN−K
where p is the probability of SUCCESS and q=1−p is the probability of FAILURE.

Let's graph the distribution of probabilities of our random variable η for different values of N with probabilities p = q = 1/2.

N=1: η = ξ1
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/2 on [0,1],
1/2 on [1,2] and
zero after x=2.

N=2: η = ξ1+ξ2
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/4 on [0,1],
1/2 on [1,2],
1/4 on [2,3] and
zero after x=3.

N=3: η = ξ1+ξ2+ξ3
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/8 on [0,1],
3/8 on [1,2],
3/8 on [2,3],
1/8 on [3,4] and
zero after x=4.

N=4: η = ξ1+ξ2+ξ3+ξ4
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/16 on [0,1],
4/16 on [1,2],
6/16 on [2,3],
4/16 on [3,4],
1/16 on [4,5] and
zero after x=5.

N=5: η = ξ1+ξ2+ξ3+ξ4+ξ5
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/32 on [0,1],
5/32 on [1,2],
10/32 on [2,3],
10/32 on [3,4],
5/32 on [4,5],
1/32 on [5,6] and
zero after x=6.

N=6: η = ξ1+ξ2+ξ3+ξ4+ξ5+ξ6
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/64 on [0,1],
6/64 on [1,2],
15/64 on [2,3],
20/64 on [3,4],
15/64 on [4,5],
6/64 on [5,6],
1/64 on [6,7] and
zero after x=7.

The above graphs obviously more and more resemble the bell curve. When squeezed by a factor of N (to transform a sum of N random variables into their average), all the graphs will be greater than zero only in a segment [0,1] and inside this segment, as N grows, the graphs will be closer and closer to a bell curve.

## Monday, November 10, 2014

### Unizor - Probability - Normal Distribution

In some way the Normal distribution of probabilities is the most important one in an entire Theory of Probabilities and Mathematical Statistics.

Everybody knows about the bell curve. This is a graphical representation of the Normal distribution of probabilities, where the probability of a random variable to be between the values A and B is measured as an area under a bell curve from an abscissa A to an abscissa B.
Not all bell-shaped distributions of probabilities are Normal, there are many others, but all Normal distributions do have a bell-like graphic representation with different position of a center and different steepness of the curve.
Of course, the entire area under a bell curve from negative infinity to positive infinity equals to 1 since this is a probability of a random variable to have any value.

Normal distribution is a continuous type of distribution. Random variable with this distribution of probabilities (we will call it sometimes normal random variable) can take any real value. The probability of normal random variable to take any exact specific value equals to zero, while the probability it takes a value in the interval from one real number (left boundary) to another (right boundary) is greater than zero.

Exactly in the middle of a graph is the expected value (mean) of our random variable. Variance, being a measure of deviation of a random variable from its mean value, depends on how steep the graph increases towards its middle. The steeper the bell-shaped graph around its middle - the more concentrated the values of a normal random variable are around its mean value and, therefore, the smaller variance and standard deviation it has.

What makes a normal random variable so special?

In short (and not very mathematically speaking), average of many random variables of almost any distributions of probabilities behaves very much like a normal random variable.

Obviously, this property of an average of random variables is conditional. A necessary condition is that the number of random variables participating in an average should be large, the more random variables are averaged together - the more the distribution of an average resembles normal distribution.
More precisely, this is a limit theorem, which means that the measure of "normality" of a distribution of an average of random variables is increasing with the number of participating random variables increasing to infinity, and the limit of a distribution of this average is exactly the normal distribution of probabilities.
The precise mathematical theorem that states this property of an average of random variables is called the Central Limit Theorem in the theory of probabilities.

There are other conditions for this theorem to hold. For instance, independence and identical distribution of random variables participating in the averaging is a sufficient (but not necessary) condition.

## Wednesday, October 29, 2014

### Unizor - Probability - Geometric Distribution - Properties

Definition

Recall the definition of the Geometric distribution of probabilities.
Assume that we conduct a sequence of independent random experiments - Bernoulli trials with the probability of SUCCESS p - with the goal to reach the first SUCCESS. The number of trials to achieve this goal is, obviously, a random variable. The distribution of probabilities of this random variable is called Geometric.

Formula for
Distribution of Probabilities

Recall from a previous lecture the formula for the probability of a random variable distributed Geometrically to take a value of K:
P(γ[p]=K) = (1−p)^(K−1)·p

Graphical Representation

Our random variable can take any integer value from 1 to infinity with the probability expressed by the above formula.
The graphical representation of this distribution of probabilities consists of a sequence of rectangles with the bases [0.1], [1,2], etc. and the height of Kth rectangle equal to (1−p)^(K−1)·p.
This resembles a staircase with gradually decreasing height of the steps from p for the first down to 0 as we move farther and farther from the beginning.

Expectation (Mean)

The expectation of a random variable that takes values x1, x2, etc. with probabilities p1, p2, etc. is a weighted average of its values with probabilities as weights:
E = x1·p1+x2·p2+...

In our case, considering the random variable can take any integer value from 1 to infinity with the probability described by the formula above, its expectation equals to
E(γ[p]) =
1·(1−p)^0·p +
2·(1−p)^1·p +
3·(1−p)^2·p +
4·(1−p)^3·p...

To calculate the value of this expression, multiply its both sides by a factor (1−p):
(1−p)·E(γ[p]) =
1·(1−p)^1·p +
2·(1−p)^2·p +
3·(1−p)^3·p+...
and subtract the result from the original sum
E(γ[p]) − (1−p)·E(γ[p]) =
(1−p)^0·p +
(1−p)^1·p +
(1−p)^2·p +...

On the left of this equation we have p·E(γ[p]).
On the right we have a geometric series that converges to p/[1−(1−p)]=1.

Therefore, p·E(γ[p]) = 1
And the mean value (expectation) of γp is
E(γ[p]) = 1/p

This value of the expectation is intuitively correct since, when the probability of SUCCESS is greater, we expect to get it sooner, in the smaller number of trials on average. Also, if the probability of SUCCESS equals to 1, which means that we cannot get FAILURE at all, we expect to get SUCCESS on the first trial. Finally, if the probability of SUCCESS is diminishing, we expect to get it on average later, with more and more trials.

Variance, Standard Deviation

Variance of a random variable that takes values x1, x2, etc. with probabilities p1, p2, etc. is a weighted average of squares of deviations of the values of our random variable from its expected value E:
Var = (x1−E)^2·p1+(x2−E)^2·p2+...

In our case, considering the random variable can take any integer value from 1 to infinity with the probability described by the formula above and its expectation equals to 1/p, the variance equals to
Var(γ[p]) =
(1−1/p)^2·(1−p)^0·p +
(2−1/p)^2·(1−p)^1·p +
(3−1/p)^2·(1−p)^2·p +
(4−1/p)^2·(1−p)^3·p...

Reducing this complex expression to a short form involves a lot of calculations. Thankfully, it was done by mathematicians and documented numerous times. The idea is similar to what we used for calculating the mean value - multiplying the sum by a denominator (1−p) of a geometric progression and subtracting the resulting sum from the original. The final formula is:
Var(γ[p]) = (1−p)/(p^2)

For probability values from 0 to 1 this expression is always positive, equals to 0 for the probability of SUCCESS equaled to 1 (as it should, since we always have the SUCCESS on the first trial with no deviation from this). As the probability of SUCCESS diminishes to 0, the variance increases to infinity (as it should, since on average it would take more and more trials to reach the SUCCESS).

As for the standard deviation, it is equal to a square root of the variance:
σ(γ[p]) = [√(1−p)]/p

### Unizor - Probability - Geometric Distribution - Definition

The Geometric distribution of probabilities is related to Bernoulli trials. It's definition is based on a concept of the first SUCCESS - the number of experiments needed to reach the first SUCCESSful result of Bernoulli trials. In theory, if FAILURE occurs time after time (which is not impossible if the probability of FAILURE is not zero), the number of experiments needed to reach the first SUCCESS can be unlimitedly large

We will analyze the behavior of this number, more precisely, we will analyze the behavior of a random variable equal to the number of experiments needed to reach the first SUCCESS in a series of independent Bernoulli trials with a probability of SUCCESS p and, correspondingly, a probability of FAILURE q=1−p.

Example 1
Suppose you want to win a lottery, but your inner voice tells you that lottery is a form of gambling and, on average, you would lose money. Still, you want to try it and you make an agreement with your inner voice to play as many times as needed until the first winning. After that - no lottery gambling.
This is a typical example of the Geometric distribution. The number of tickets you have to buy to reach the first winning is exactly that random variable we talked about in the above definition.

Example 2
Assume a couple wants to have a daughter. If a son is born, they try again and again until a daughter is born. If the probability of a giving birth of a daughter is not zero, this process will end up after some number of trials. The number of children in this family born until they have a daughter is a random variable with Geometric distribution of probabilities.

Example 3
A student wants to pass a test that consists of certain number of questions. He knows answers to some of them, but not all. If he gets a question he knows, he will pass, otherwise he has to come again for a test.
Let's assume that a student does not study in between the tests, so the probability of getting a familiar question is the same. Then the number of attempts he makes to pass the test is a random variable with Geometric distribution of probabilities.

Now we are ready to precisely define the Geometric distribution.
Our sample space of elementary events can be represented as a set of strings of letters S (for a Bernoulli trial that results in SUCCESS) and F (for FAILUREs). Each string has certain number of letters F in the beginning and a single letter S at the end. This models a series of independent Bernoulli trials that last until the result is SUCCESS.

The random variable with Geometric distribution in this model, that is a numeric function defined for each elementary event, is the length of a string that represents this elementary event. Our task is to determine the probability of this length to be equal to some non-negative integer value.

The probability of our random variable to be equal to some number K is the probability to have K−1 FAILUREs in a series of independent Bernoulli trials followed by a SUCCESS.
If the result of the ith Bernoulli trial is a random variable β[i] with values S (SUCCESS) or F (FAILURE), we have to determine the following probability:
P(β=F AND β=F AND...AND β[K−1]=F AND β[K]=S)
As we know, the probability of a combination of independent events equals to a product of their individual probabilities. Therefore, our probability equals to
P(β=F)·P(β=F)·...·P(β[K−1]=F)·P(β[K]=S)
which, in turn, equals to
(1−p)^(K−1) · p

Let's denote our random variable with Geometric distribution γ[p] (index p signifies the probability of SUCCESS in each Bernoulli trial that participates in the definition of the Geometric distribution). Then we can describe the distribution of probabilities for this random variable as
P(γ[p]=K) = (1−p)^(K−1) · p

The above formula completely defines the random variable with Geometric distribution and will be used in further analysis of its properties and characteristics.

For example, let's calculate a probability of winning a lottery on the first trial (K=1) if the probability of winning for a single lottery ticket is p=0.4:
P(γ[0.4]=1)=(1−0.4)^(1−1) · 0.4=0.4
How about winning on the 3th attempt (K=3)?
P(γ[0.4]=3)=(1−0.4)^(3−1) · 0.4=0.144

In conclusion, let's check that a sum of probabilities of our random variable γp to take all possible values equals to 1.
All we have to do is to summarize an expression (1−p)^(K−1) · p for all K from 1 to infinity. Obviously, this is an infinite geometric series. It's sum depends only on the first member a (which is equal to p) and a multiplier (denominator) r (which is equal to 1−p) and is expressed as
S = a/(1−r)
In our case
S = p/[1-(1-p)] = p/p = 1
So, our definition of Geometric distribution of probabilities satisfies a necessary condition to sum up to 1.

## Monday, October 27, 2014

### Unizor - Probability - Binomial Distribution

Let's recall the definition of a binomial distribution with two parameters: the number of simultaneous independent Bernoulli trials N and the probability of SUCCESS in each of them p.
The Binomial distribution of probabilities is a distribution of a random variable ξ[N,p] that is equal to a number of SUCCESSful Bernoulli trials. This random variable is, obviously, takes values from 0 to N with different probabilities and the Binomial distribution of probabilities is a distribution of probabilities among these N+1 values.

Example
Consider a case with N=3, that is we make three independent Bernoulli trials and count the number of SUCCESSes ξ[3,p]. The number of SUCCESSes in this case is either 0 or 1, or 2, or 3. There are eight different outcomes of these three Bernoulli trials:
(F,F,F) and ξ[3,p](F,F,F)=0
(F,F,S) and ξ[3,p](F,F,S)=1
(F,S,F) and ξ[3,p](F,S,F)=1
(F,S,S) and ξ[3,p](F,S,S)=2
(S,F,F) and ξ[3,p](S,F,F)=1
(S,F,S) and ξ[3,p](S,F,S)=2
(S,S,F) and ξ[3,p](S,S,F)=2
(S,S,S) and ξ[3,p](S,S,S)=3
Since the probability of SUCCESS in any single independent Bernoulli trial is p and the probability of FAILURE is q=1−p, we can conclude that
P(ξ[3,p]=0) = P(F,F,F) = q^3
P(ξ[3,p]=1) = P(F,F,S)+P(F,S,F)+P(S,F,F) = 3·p·q^2
P(ξ[3,p]=2) = P(S,S,F)+P(S,F,S)+P(F,S,S) = 3·p^2·q
P(ξ[3,p]=3) = P(S,S,S) = p^3
Just for checking, the sum of all the probabilities must be equal to 1. Indeed,
p^3+3·p^2·q+3·p·q^2+q^3=(p+q)^3=1
If p=1/2 and we want to represent this distribution of probabilities graphically, we can construct four rectangles on the coordinate plane, one with a base [0,1] and height p^3=1/8, another - with a base [1,2] and the height 3·p^2·q=3/8, the third - with a base [2,3] and the height 3·p·q^2=3/8 and the fourth - with a base [3,4] and the height q^3=1/8.
Notice that the probabilities of ξ[3,p] taking different values are members of the expression (p+q)^3.

It's time to consider a general case.
Let ξ[N,p] to represent the number of SUCCESSes in N independent Bernoulli trials with a probability of SUCCESS in each of them equal to p.
We want to find the probability of this random variable to take a value K, where K can be any number from 0 (all FAILUREs) to N (all SUCCESSes).
In order to have exactly K SUCCESSes as a result of this experiment we have to have K Bernoulli trials to be SUCCESSful and N−K trial to be FAILUREs. Since we don't care which exactly trials succeed and which fail, to calculate our probability, we have to summarize the probabilities of all elementary events that contain K SUCCESSes and N−K FAILUREs.
The number of such elementary events equals to the number of combinations from N objects by K, that is C(N,K).
The probability of each elementary event that contains K SUCCESSes and N−K FAILUREs equal to p^K·q^(N−K).
Therefore,
P(ξ[N,p]=K) = C(N,K)·p^K·q^(N−K)

Incidentally, recall the Newton's binomial formula presented in this course in the chapter on mathematical induction:
(a+b)^n = Σi∈{0,n}[C(n,i)·a^(n-i)·b^i]
(summation by i from 0 to n).
As you see, the coefficients in the binomial formula are exactly the same as individual probabilities of the random variable that has binomial distribution of probabilities. That's why our random variable's distribution of probabilities is called Binomial.

Mean (Expectation) and Variance

In the lecture dedicated to a definition of the distribution of probabilities we defined Bernoulli an Binomial distributions and mentioned that a Binomial random variable with parameters N and p can be considered as a sum of N independent Bernoulli random variables with a probability of SUCCESS p.

Therefore, using the additive properties of the expected value (mean) of a sum of random variables, we can derive the expected value of the Binomial random variable ξN,p. It is just a sum of expected values of N Bernoulli random variables with a probability of SUCCESS p. Since the expected value (mean) of such a Bernoulli random variable equals to p, the expected value of our Binomial random variable is
E(ξ[N,p]) = N·p

Similarly, we know that a variance of a sum of independent random variables equals to a sum of their variances.

As mentioned many times above, a Binomial random variable ξN,p is not just a sum of N Bernoulli random variables with a probability of SUCCESS p, but a sum of N independent Bernoulli random variables.
The variance of each such Bernoulli random variance, as we know, is p·q, where q=1−p.
Therefore, the variance of our Binomial random variable is
Var(ξ[N,p]) = N·p·q

From this we can derive the standard deviation of Binomial random variable ξ[N,p]:
σ(ξ[N,p]) = √(N·p·q)

## Wednesday, October 22, 2014

### Unizor - Probability - Bernoulli Distribution

Let's examine the properties of Bernoulli distribution of probabilities, a distribution of a random variable ξ, defined on a space of only two elementary events that we call SUCCESS (with a probability measure p) and FAILURE (with probability measure q=1−p), and taking, correspondingly, two values
ξ(SUCCESS) = 1 and
ξ(FAILURE) = 0.
So, we can say that our random variable ξ takes a value of 1 with probability p and a value of 0 with probability q=1−p:
P(ξ=1) = p and
P(ξ=0) = q = 1−p

Graphical representation of this distribution of probabilities is trivial. We build one rectangle with a segment [0,1] as a base and the height of p that represents the measure of probability of our random variable ξ to take its first value of 1 and another rectangle with a segment [1,2] as a base and the height of q that represents the measure of probability of ξ to take its second value of 0.
The total area of these two rectangles is, obviously, equal to 1, as it should be, since the sum of probabilities of ξ to take all possible values must be equal to 1.

The next task is a calculation of the expected value or mean of our random variable ξ.
Recall that, if a random variable takes certain discrete values with known probabilities, its expectation is a weighted average of its values with probabilities as weights.
In our case there are only two values, 1 and 0 that a random variable ξ takes with probabilities, correspondingly, p and q=1−p. So, the expectation (or mean) of such a random variable equals to
E(ξ) = 1·p + 0·q = p
This value of expectation is measured in the same units as the values of our random variable. For example, if two values, 1 and 0, are dollars, the expectation is p dollars. In this case it's just a coincidence that both the expectation and probability of having a value of 1 are the same and equal to p. The probability is a measure of frequency and, as such, has no units of measurement, while the expectation has the measurement of the random variable itself.

This result was easily predictable based on the statistical meaning of probabilities. If we repeat our Bernoulli trial N times, the number of SUCCESSes, where ξ equals to 1, will be approximately p·N and the remaining q·N results, where ξ equals to 0, will be FAILUREs. The precision of this approximation is increasing with the number of trials increasing to infinity.
Therefore, the average value of our random variable in N experiments will be approximately equal to
Ave(ξ)=(1·p·N + 0·q·N)/N=p
As the number of experiments grows, the statistical average will more and more precisely equal to p and, in a limit case, the equality will be absolutely precise. So, the expectation of a random variable is its statistical average value as the number of experiments grows to infinity.

The expectation or mean value of a random variable is, arguably, the most important its characteristic. The next in importance comes its standard deviation, which is a square root of its variance, which, in turn, is a weighted average of squares of deviations of the values of a random variable from its expectation.

Let's calculate these important characteristics.
There are two values our random variable ξ takes, 1 with the probability p and 0 with the probability q. Its mean value is equal to p. Therefore, weighted average of squares of deviations of its values from its expectation is equal to
Var(ξ) = (1−p)^2·p+(0−p)^2·q =
= (1−2·p+p^2)·p+p^2·q =
= p−2·p^2+p^3+p^2·(1−p) =
= p−p^2 = p·(1−p) =
= p·q since q=1−p.

From Var(ξ) = p·q we derive the standard deviation of our random variable
σ(ξ) = √(p·q) = √[p·(1−p)]
The standard deviation is a good measure of how wide the values of a random variable are spread around its mean measured in the same units as the random variable itself and its expectation (mean).

Example

Consider a lottery where you try to guess up to 6 numbers from 49. Assume, for simplicity, that you win (it is a SUCCESS) if you guessed at least 4 numbers and you don't win (a FAILURE) if you guessed 3 numbers or less. Using the results of calculation made in one of the previous lectures, the probability of SUCCESS is about 0.001.
Assume further that in case of SUCCESS you win a prize of \$1000, which we assume to be equal to 1 measured in thousands of dollars and get nothing in case of FAILURE.
What are the expectation and standard deviation of your winning?

Since probability of SUCCESS equals to 0.001, the mean of the winning equals to 0.001 (measured in thousands of dollars), that is \$1.
The standard deviation equals to
√0.001·0.999 ≅ 0.0316
(also in thousands of dollars) which is about \$32.
So, your average win will be \$1, but deviations from it are quite substantial, averaging around \$32, but, obviously, might be much greater than that.

## Tuesday, October 21, 2014

### Unizor - Probability - Binary Distributions

Binary probability distribution is a distribution related to random experiments with just two outcomes.

In this lecture we will consider two binary distributions:
Bernoulli distribution and Binomial distribution.

Bernoulli Distribution

This is a distribution within a sample space that contains only two elementary events called SUCCESS and FAILURE. Then the measure of probability p (0≤p≤1) is assigned to one of them and the measure of probability q=1−p is assigned to another.

Usually, we will not deal with this sample space or its elementary events, but, instead, assume that there is a random variable ξ, defined as a numeric function on this sample space, that takes the value of 1 on one elementary event - ξ(SUCCESS)=1 - and the value of 0 on another - ξ(FAILURE)=0, with probabilities, correspondingly, p and q=1−p.
Symbolically,
P(ξ=1) = p
P(ξ=0) = q = 1−p

We can describe this differently, using the random variable ξ we defined above that takes two values 1 and 0, correspondingly on SUCCESS and FAILURE. Assume we repeat our experiment with two outcomes again and again, and the result of the Jth experiment is Ej. Then ξ(Ej)=1 if Ej=SUCCESS and ξ(Ej)=0 if Ej=FAILURE.
Then, if we conduct N experiments, the sum of all ξ(Ej), where J runs from 1 to N, symbolically expressed as Σ{J∈[1,N]} ξ(Ej), is the number of times our experiment ended in SUCCESS.
Therefore, the ratio of the number of SUCCESS outcomes to a total number of experiments equals to
[Σξ(Ej)] / N
Since the limit of this ratio, as the number of experiments increases to infinity, is the definition of the measure of probability of the outcome SUCCESS, we can write the following scary looking equality that symbolically states what we talked about when defining the Bernoulli distribution:
lim(N→∞){[Σξ(Ej)] / N} = p

A possible interpretation of the above equality that involves the limits might be that with large number of experiments N the number of SUCCESS outcomes is approximately equal to p·N.

A simple example of a Bernoulli distribution is a coin tossing. With an ideal coin the heads and tails have equal chances to come up, therefore their probabilities are 1/2 each:
If we associate a random variable ξ with this random experiment and set ξ(HEADS)=1 and ξ(TAILS)=0, we obtain a classic example of a Bernoulli random variable ξ:
P(ξ=1) = p = 1/2 and
P(ξ=0) = q = 1−p = 1/2

Binomial Distribution

Consider N independent Bernoulli random experiments with results SUCCESS or FAILURE and the same probability of SUCCESS in each one. The number of SUCCESSes among the results of this combined experiment is a random variable. It can take values from 0 to N with different probabilities. The distribution of this random variable is called Binomial.

Obviously, we are interested in quantitative characteristics of this distribution, more precisely, we would like to calculate the probability of having exactly K SUCCESSes out of N independent Bernoulli experiments with the probability of SUCCESS equal to p in each one, where K can be any number from 0 to N.

Using the language of Bernoulli random variables, our task can be formulated differently.
Let ξi be a Bernoulli random variable that describes the i-th Bernoulli experiment, that is it is equal to 1 with a probability p and equals to 0 with a probability q=1−p. Then the sum of N such random variables is exactly the number of SUCCESSes in N Bernoulli experiments we are talking about.
So, the random variable
η = Σ ξi ,
where all ξi are independent Bernoulli random variables,
has Binomial distribution.

Let's now calculate the probabilities of our Binomial random variable η to have different values, that is let's determine the quantitative characteristic of this distribution.

We are interested in determining the value of P(η=K) for all K from 0 to N.

For a sum η of N independent Bernoulli variables ξi to be equal to K, exactly K out of N of these Bernoulli variables must be equal to 1 and the other N−K variables must be equal to 0.
From combinatorics theory we know that we can choose K elements out N in
C(N,K) = (N!)/[(K!)·(N−K)!]
ways.
Once chosen, these K random variables must be equal to 1 with a probability p^K and the other N−K variable must be equal to 0 with a probability q^(N−K).
That determines the probability of a Binomial random variable to have a value of :
P(η=K) = C(N,K) · p^K · q^(N−K)
The only two parameters of this distribution are the number of Bernoulli random variables N participating in the Binomial distribution (which is the number of Bernoulli random experiments results of which we follow) and the probability of SUCCESS p for each such Bernoulli experiment.
The probability q is not a new parameter since q=1−p and the formula above can be written as
P(η=K) = C(N,K) · p^K · (1−p)^(N−K)

## Wednesday, October 8, 2014

### Unizor - Probability - Continuous Distribution

Let's consider a different random experiment that is modeled by an infinite and uncountable set of elementary events and, correspondingly, uncountable number of values of a random variable defined on them.

For example, we measure weights of tennis balls manufactured by a particular plant. In theory (and we are talking about mathematical model, not the real world), this weight can be any real non-negative number within certain limits, like from 50 to 60 gram. Let's assume that we can measure the weight absolutely precisely. Repeating the experiment again and again and counting the times this weight exactly equals to, say, 55 gram and taking a ratio of the number of times when the weight is 55 gram to the total number of experiments, we will get closer and closer to zero. And so will be with any other specific weight.

So, any particular value of our random variable has a probability of zero, but the number of these values is a number of all real numbers from 50 to 60 - an uncountable infinity. This presents a mathematical challenge in operating with particular values of this random variable.

To overcome this challenge, instead of considering individual values of our random variable, we should consider intervals.
In our example of the weight of a tennis ball, we can talk about a probability of this weight to be in the interval from, say, 54 to 56 gram. This probability will be greater than zero. The wider our interval - the larger the probability. At the extreme, for an interval from 50 to 60 gram, the probability will be equal to 1 because all tennis balls are manufactured with the weight in this interval.

The probability of our random variable of having any particular exact value equals to zero, but it has a non-zero probability of having a value within some interval, different probabilities for different intervals. Such random variables are called continuous.
Then for such random variable ξ we can say that the probability of ξ to take a value in the interval [a,b] equals to p (which depends on a and b). Usually, all possible values of a continuous random variable constitute a finite or infinite continuous interval. The probability of this random variable to take a value within this interval equals to 1 and the probability of it to take a value in some narrower interval is less than 1.

Example - Sharp Shooting Competition

Sharpshooters are shooting a target, and the random variable we are interested in is the distance of a point where a bullet hits the target from the target's center.
For a particular sharpshooter, assuming his skill level is constant, he does not get tired and does not miss for more than 0.5 meters, the continuous distribution of this random variable is defined on the range of values from 0 (when he hit exactly at the center of a target) to a maximum deviation of his bullet from the center that we assumed to be 0.5 meters.

For any exact value of our random variable, say, 0.2768 meters, the probability of having this value is 0. It is obvious if we recall that the probability is a limit of the ratio of the number of occurrence of a particular event to a total number of experiments. As a sharpshooter fires shots to infinity, the ratio of the number of shots on a distance of exactly 0.2768 meters from a center, as well as on any other exact distance, tends to zero.

The continuous distribution of probabilities can (also only approximately) be represented graphically similarly to the way we presented the discrete distributions. First of all, we break an entire range of values of our random variable (the distance from a center of a target) into 5 smaller intervals of 0.1 meters wide and mark these points on the X-axis: 0, 0.1, 0.2, 0.3, 0.4 and 0.5. On each interval between these values we construct a rectangle of a height equal to the corresponding probability of our random variable to fall within this interval. So, for a random variable ξ (results of the first shooter) the rectangles might be:
base [0,0.1] - height 0.6
base [0.1,0.2] - height 0.2
base [0.2,0.3] - height 0.1
base [0.3,0.4] - height 0.07
base [0.4,0.5] - height 0.03

Obviously, the more intervals we use to break the entire range of values of a random variable into smaller intervals - the more precisely we can characterize the continuous distribution.

## Tuesday, October 7, 2014

### Unizor - Probability - Discrete Distribution

In the previous lectures we introduced a concept of a random experiment and, as its mathematical model, we described sample space Ω (a set) that contained finite number N of elementary events e1, e2,...,eN (elements of this set) modeling the results (outcomes) of our random experiment.

With each such elementary event eK (where K∈[1,N]) we associated certain real number pK, its probability
0 ≤ pK ≤ 1 (K∈[1,N])
ΣpK = 1 where K∈[1,N]

This probability has all the characteristics of a measure (like weight, length or area) - its non-negative and additive - with the only restriction that the sum of all probability measures of all elementary events totals to 1 since it reflects the ratio of occurrence of any outcome to a total number of experiments.

The set of probabilities p1, p2,...,pN is called a probability distribution of our random experiment with finite number of outcomes.

If there is a random variable ξ defined on the set of elementary events that takes the value xK for an event eK (K∈[1,N]), then these same probabilities define the probability distribution of the random variable ξ:
P(ξ=x1) = p1
P(ξ=x2) = p2
...
P(ξ=xN) = pN

As you see, in this description we address random experiments with finite number of outcomes. The probability distributions associated with these random experiments, their elementary events and random variables defined on these events are called discreet since the different values of probabilities as well as the different values of random variables defined on the elementary events are separated from each other. They can be represented as individual points on a numeric line.

In a more complicated case there might be infinite but countable number of elementary events and values of a random variable defined on them.
For example, an experiment might be to choose any natural number N (this is the elementary event and, at the same time, the value of a random variable ξ defined on it) and assign a value of 1/(2^N) to a probability associated with this elementary event:
P(ξ=N)=1/(2^N).
There are infinite but countable number of different elementary events, all probabilities are in the range from 0 to 1 and their sum (which is a sum of an infinite geometric progression 1/2 + 1/4 + 1/8 + ...) equals to 1, as can be easily shown.
The distribution of probabilities in this and analogous cases is also considered discrete since there is always a non-zero distance between different measures of probabilities and, if a random variable is defined on these elementary events, the values of such random variable will also be discrete, that is separated from each other.

Incidentally, for the example above would be interesting to calculate the expected value of the random variable ξ, that is to find a sum
Σ[K/(2^K)] where K∈[1,∞).
It's a good problem on geometric progression and we recommend to try to solve it by yourselves. The answer should be 2, by the way, but you need a little trick to come up with it.

It's very useful to represent the distribution of probabilities in a graphic form using a concept of an area as a substitute for a probability measure. On an X-axis in this representation we will use the points 1, 2, 3,... as the corresponding representation of elementary events e1, e2, e3,... This can be used for both finite and infinite countable number of elementary events.
Next on each segment from K−1 to K we build a rectangle of the height equal to the probability of the elementary event eK. The resulting figure - a combination of all such rectangles - is a good representation of the distribution of probabilities among elementary events. It's total area would always be 1 and elementary events with larger probability measure would be represented by higher rectangles.

For example, the distribution of probabilities for an ideal dice would be a set of 6 rectangles of equal height of 1/6.

In an example above, when the probability of choosing the number N equals to 1/(2^N), the picture would be different. The rectangle built on a segment from 1 to 2 will have a height 1/2, from 2 to 3 - 1/4, from 3 to 4 - 1/8 etc. Every next rectangle would have a height equal to a half of a previous one, sloping down to 0 as we move to infinity.

## Monday, September 29, 2014

### Unizor - Probability - Random Variables - Independence

There are a few different but equivalent approaches to define independent random variables. We will use a simple approach based on the fact that we are only dealing with random variables that take finite number of values.
So, assume a random variable ξ takes values
x1, x2,..., xM
with probabilities
p1, p2,..., pM.
Further, assume a random variable η takes values
y1, y2,..., yN
with probabilities
q1, q2,..., qN.

Let us remind that a random variable is a numeric function defined on each elementary event participating in the random experiment. The fact that a random variable ξ takes some value xi means that one of the elementary events, where the value of this numeric function equals to xi, indeed occurred. The combined probability of all the elementary events, where this numeric function equals to xi is the probability of a random variable ξ of having the value xi. The combination of all these elementary events make up an event characterized by a description "an event where random variable ξ takes the value xi".

For instance, when rolling two dice and summarizing the rolled numbers (this sum is our random variable), elementary events (1,5), (2,4), (3,3), (4,2) and (5,1) combined together form an event described as "our random variable took a value of 6".

Let's assign a symbol Ai to this combined event of a random variable ξ of taking the value xi. So, according to our notation,
P(ξ=xi) = P(Ai)

Analogously, let Bj be an event of a random variable η taking the value yj. So, according to our notation,
P(η=yj) = P(Bj)

The above considerations allow us to use the language and properties of events to describe the values of and relationships between random variables.
Thus, we can define a conditional probability of one random variable relative to another using already defined conditional probability between events:
P(ξ=xi | η=yj ) = P(Ai | Bj )

If the conditional probability of a random variable ξ taking any one of its values under the condition that a random variable η took any one of its values equals to an unconditional probability of ξ taking that value, then a random variable ξ is independent of a random variable η.
In other words, ξ is independent of η if
P(ξ=xi | η=yj ) = P(ξ=xi )
where index i can take any value from 1 to M and index j can take values from 1 to N.

Important Property of Independent Random Variables

From the definition and properties of conditional probability for events X and Y we know that
P(X | Y) = P(X∩Y)/P(Y)
Therefore,
P(ξ=xi | η=yj ) = P(ξ=xi ∩ η=yj ) / P(η=yj )
But, if random variable ξ and η are independent, the conditional probability of ξ taking some value under some condition imposed on η is the same as its unconditional probability. Therefore,
P(ξ=xi ) = P(ξ=xi ∩ η=yj ) / P(η=yj )
from which follows
P(ξ=xi ∩ η=yj ) = P(ξ=xi ) · P(η=yj )
Verbally, it can be expressed as the statement that the probability of two independent variable simultaneously taking some values equals to a product of probabilities of them separately taking their corresponding values.
This, for random variables, similarly to an analogous property of probability of independent events, is a characteristic property of their independence. It is completely equivalent to a definition of independent variables that uses conditional probability.

## Saturday, September 27, 2014

### Unizor - Probability - Random Variables - Problems 2

Problem 2.1.
Calculate expected value E(ξ), variance Var(ξ) and standard deviation σ(ξ) of a random variable ξ equal to a winning in a simplified game of roulette played by the following rules:
(a) You bet \$1 on a number 23 in a game of roulette with a spinning wheel that includes numbers from 1 to 36, 0 and 00.
(b) If the ball stops on a spinning wheel in a cell with a number 23, you win \$36.
(c) If the ball stops in any other cell (including 0 and 00), you lose your \$1 bet.

E(ξ) = −1/38 ≅ −0.026316
Var(ξ) ≅ 35.078355
σ(ξ) ≅ 5.922690

Solution

Our first step is to specify the values
x1, x2,..., xN
that our random variable ξ takes, and the corresponding probabilities
p1, p2,..., pN
of each such value.

First of all, there are only two outcomes of this game, win or lose. Correspondingly, the number N of different values our random variable can take equals to 2. So, we are dealing with two random values, 36 (win) and −1 (lose).
Considering there are 38 different outcomes for a position of ball on the spinning wheel (numbers from 1 to 36, 0 and 00) with equal chances of each, the probability of winning is 1/38, while the probability of losing is 37/38.

Here is full specification of our random variable:
P(36) = 1/38
P(−1) = 37/38

The expected value of this random variable equals to a weighted average of its values with the corresponding probabilities as weights.
E(ξ) =
36·(1/38) + (−1)·(37/38) =
= (36−37)/38 = −1/38 ≅
≅ −0.026316

The variance is a weighted average of squares of differences between the values of our random variable and its expected value with corresponding probabilities as weights.
Var(ξ) =
[36−(−1/38)]^2·(1/38) +
+ [−1−(−1/38)]^2·(37/38) ≅
≅ 35.078355

Finally, the standard deviation is a square root of the variance and equals to
σ(ξ) ≅ 5.922690
The end.

Problem 2.2.
The problem above is a particular case of a more general problem. Assume the random variable that takes only two values, X and Y with probabilities P(X)=p and P(Y)=q, where both p and q are non-negative and p+q=1.
Determine its expectation, variance and standard deviation.

E(ξ) = X·p+Y·q
Var(ξ) = (X−Y)^2·p·q
σ(ξ) = |X−Y|·√(p·q)

Solution

There are only two outcomes of this experiment with our random variable taking values X or Y with probabilities p and q correspondingly, where p+q=1 (we will use this equality to replace 1−q with p and 1−p with q).
So, the expectation equals to
E(ξ) = X·p+Y·q
The variance, as a weighted average of squares of differences between the values of our random variable and its expectation, where weights are probabilities, is equal to
Var(ξ) =
[X−(X·p+Y·q)]^2·p +
+ [Y−(X·p+Y·q)]^2·q =
= (X−Y)^2·q^2·p + (Y−X)^2·p^2·q =
= (X−Y)^2·p·q·(p+q) =
= (X−Y)^2·p·q
Finally, the standard deviation is a square root from the variance and equals to
σ(ξ) = |X−Y|·√(p·q)

### Unizor - Probability - Random Variables - Problem 1

Calculate expected value E(ξ), variance Var(ξ) and standard deviation σ(ξ) of a random variable ξ formed by a sum of numbers on the top of two perfect dice rolled together.

### Unizor - Probability - Random Variables - Variance

Random variables take many different values with different probabilities. Full description of these discreet random variables includes enumeration of all its values and corresponding probabilities. That is a full but lengthy description of a random variable. Though it is full, it does not provide an easy answer to such question as "What is my risk if I play this game for money?" or "What is an interval I should observe the values of this random variable most of the time?" There is, however, a desire to characterize the random variable with a small number of properties to give an idea of its behavior, evaluate risk (if risk is involved) and predict its values with certain level of precision, which, arguably, is one of the main purposes of theory of probabilities.

The first such property that we have introduced is the expected value or expectation of a random variable. Though it does provide some information about the random variable, it's not really sufficient to make good predictions about its values.

Assume that our random variable ξ takes values
x1, x2,..., xN
with probabilities
p1, p2,..., pN.
Then its expectation is
E(ξ) = x1·p1+x2·p2+...+xN·pN
This expectation can be viewed as a weighted average of the values of our random variables with probabilities of these values taken as weights .

To evaluate the deviation of the values of a random variable from its expectation, we are interested in weighted average of the squares of differences between values of this random variable and its expectation, that is in the expectation of a new random variable (ξ−a)^2, where a=E(ξ):
Var(ξ) = (x1−a)^2·p1+(x2−a)^2·p2+...+(xN−a)^2·pN

## Thursday, September 11, 2014

### Unizor - Probability - Random Variables - Expectation Sum

Our goal in this lecture is to prove that expectation of a sum of two random variables equals to a sum their expectations.
Consider the following two random experiments (sample spaces) and random variables defined on their elementary events.

Ω1=(E1,E2,...,Em )
with corresponding measure of probabilities of these elementary events
P=(P1,P2,...,Pm )
(that is, P(Ei )=Pi - non-negative numbers with their sum equaled to 1)
and random variable ξ defined for each elementary event as
ξ(Ei) = Xi where i=1,2,...m

Ω2=(F1,F2,...,Fn )
with corresponding measure of probabilities of these elementary events
Q=(Q1,Q2,...,Qn )
(that is, Q(Fj )=Qj - non-negative numbers with their sum equaled to 1)
and random variable η defined for each elementary event as
η(Fj) = Yj where j=1,2,...,n

Separately, the expectations of these random variables are:
E(ξ) = X1·P1+X2·P2+...+Xm·Pm
E(η) = Y1·Q1+Y2·Q2+...+Ym·Qn

Let's examine the probabilistic meaning of a sum of two random variables defined on two different sample spaces.
Any particular value Xi+Yj is taken by a new random variable ζ=ξ+η defined on a new combined sample space Ω=Ω1×Ω2 that consists of all pairs of elementary events (Ei,Fj ) with the corresponding combined measure of probabilities of these pairs equal to
R(Ei,Fj ) = Rij
where index i runs from 1 to m and index j runs from 1 to n.

Thus, we have defined a new random variable ζ=ξ+η defined on a new sample space Ω of M·N pairs of elementary events from two old spaces Ω1 and Ω2 as follows
ζ(Ei,Fj ) = Xi+Yj
with probability Rij
Consider a sum Ri1+Ri2+...+Rin. It represents a probability of the first experiment resulting in a fixed elementary event ei while the second experiment resulting in either F1 or in F2, or in any other elementary event it may. That is, the result of the second experiment is irrelevant and this sum simply represents a probability of the first experiment resulting in Ei, that is it is equal to Pi:
Ri1+Ri2+...+Rin = Pi.
Similarly, fixing the result of the second experiment to Fj and letting the first experiment to end up in any way it may, we conclude
R1j+R2j+...+Rmj = Qj.
Keeping in mind the above properties of probabilities Rij, we can calculate the expectation of our new random variable ζ.
E(ζ) = E(ξ+η) =
= (X1+Y1)·R11+...+(X1+Yn)·R1n +
+ (X2+Y1)·R21+...+(X2+Yn)·r2n +
...
+ (Xm+Y1)·Rm1+...+(Xm+Yn)·Rmn =
(opening parenthesis, changing the order of summation and regrouping)
= X1·(R11+...+R1n ) +
+ X2·(R21+...+R2n ) +
...
+ Xm·(Rm1+...+Rmn ) +
+ Y1·(R11+...+Rm1 ) +
+ Y2·(R12+...+Rm2 ) +
...
+ Yn·(R1n+...+Rmn ) =
(using the properties of sums of combined probabilities Rij )
= X1·P1 +...+ Xm·Pm +
+ Y1·Q1 +...+ Yn·Qn =
= E(ξ) + E(η)
End of proof.

## Monday, September 8, 2014

### Unizor - Probability - Random Variables - Expectation Examples

Example 1
What is the expected value of a random variable ξ that equals to a sum of two numbers obtained by rolling two dice?
We have 36 combinations of numbers as a result of rolling two dice. Each pair has the same chances to occur as any other, therefore, the probability of each pair equals to 1/36.
The sum of two numbers takes values from 2 to 12 in the following manner:
2:1,1
3:1,2/2,1
4:1,3/2,2/3,1
5:1,4/2,3/3,2/4,1
6:1,5/2,4/3,3/4,2/5,1
7:1,6/2,5/3,4/4,3/5,2/6,1
8:2,6/3,5/4,4/5,3/6,2
9:3,6/4,5/5,4/6,3
10:4,6/5,5/6,4
11:5,6/6,5
12:6,6
Since each pair has a probability of 1/36 and, as we know, the probability is an additive measure, the probability of a combination of N pairs equals to N/36. Hence, the probabilities of different values for a sum of numbers on two dice equal to
P(ξ=2)=1/36
P(ξ=3)=2/36
P(ξ=4)=3/36
P(ξ=5)=4/36
P(ξ=6)=5/36
P(ξ=7)=6/36
P(ξ=8)=5/36
P(ξ=9)=4/36
P(ξ=10)=3/36
P(ξ=11)=2/36
P(ξ=12)=1/36
Using the formula for expected value
E(ξ) = p1·x1+p2·x2+...+pN·xN
We can calculate the expected value in this particular case as
E(ξ) = (1/36)·2 + (2/36)·3 +
+ (3/36)·4 + (4/36)·5 +
+ (5/36)·6 + (6/36)·7 +
+ (5/36)·8 + (4/36)·9 +
+ (3/36)·10 + (2/36)·11 +
+ (1/36)·12 =
= (2+6+12+20+30+42+
+40+36+30+22+12)/36 =
= 252/36 = 7
So, the expected value of a sum of two numbers obtained by rolling two dice equals to 7.
It means that, if we repeat our experiment with rolling of two dice a very large number of times, the average sum of these two numbers per experiment would be very close to 7.

Example 2
What is the average value of a card in the game of Blackjack (assume for simplicity that Aces are always valued at 11 points)?
The standard deck of cards contains 52 cards valued by numbers on the cards (from 2 to 10) or by 10 for pictures (Jack, Queen and King), or by 11 (Ace) with four suits per each. Therefore, we have
4 cards valued at 2 points
4 cards valued at 3 points
4 cards valued at 4 points
4 cards valued at 5 points
4 cards valued at 6 points
4 cards valued at 7 points
4 cards valued at 8 points
4 cards valued at 9 points
16 cards (10,J,Q,K) at 10 points
4 cards valued at 11 points
Since the probability to get each card is 1/52, the expected value of any card pulled randomly from a deck is
(4/52)·2+(4/52)·3+(4/52)·4+(4/52)·5+(4/52)·6+(4/52)·7+
+(4/52)·8+(4/52)·9+(16/52)·10+(4/52)·11 =
= (8+12+16+20+24+28+32+36+160+44)/52 = 380/52 ≅ 7.3
Of course, the situation above is a simplification of the real case. The rule that Ace can be counted as 1 or 11, as you wish, complicates the real picture.
Another complication is that usually a few cards have already been given to other players or yourself, so the probabilities must be based not on a full deck, but on whatever is left after the first few cards were distributed and depends on these cards.
However, Blackjack is usually played in a casino with more than one deck (like 5 or 6), which decreases the influence of previously dealt cards on the probabilities.

In conclusion, let's notice that the expected value of a random variable is not necessarily one of its real values. For instance, in Example 1 the expected value of 7 is the real value for two dice in positions 1+6, 2+5, 3+4, 4+3, 5+2 and 6+1. However, in Example 2 the expected value of 380/52≅7.3 cannot be a real value of the card.

### Unizor - Probability - Random Variables - Expected Values

Assume our random experiment (like spinning the roulette wheel and a ball on it) results in the following K elementary events (like ball stops in a partition with some number):
Ω = { e1, e2,..., eK }
Assume further that we know the probabilities of each elementary event (maybe, they have equal chances, maybe not):
P(ei) = pi (i runs from 1 to K)
Finally, assume we have defined a random variable on this sample space that represents a numerical result of an experiment (like winning on a \$1 bet):
ξ(ei) = xi (i runs from 1 to K)
If we conduct our experiment N times (consider this a very large number), we expect, approximately, that elementary event e1 will occur in N·p1 number of cases with our random variable, correspondingly, taking a value of x1.
Similarly, in, approximately, N·p2 number of cases our random experiment will end up at elementary event e2 and our random variable will take a value of x2.
And similar for all other indices.
Knowing this statistics, we can approximate the average value of our random variable ξ. If during N experiments in, approximately, N·p1 number of cases it took value x1, in, approximately, N·p2 number of cases it took value x2, etc. then the sum of all values in took in all experiments equals to
N·p1·x1 + N·p2·x2 + ... + N·pK·xK
and the average value of our random variable ξ per single experiment equals to
E(ξ) = p1·x1+p2·x2+...+pK·xK
This value depends only on probabilities of elementary events pi and corresponding values of our random variable xi and is called mathematical expectation of a random variable or just expectation or expected value.

Can we say that our random variable "takes an average value" calculated above as E(ξ)? No, this is not correct. It might never take this value. But, with the number of random experiments, where it is a numerical result, grows to infinity, its average value per experiment (that is, a sum of all values divided by the number of experiments) tends to E(ξ). The correct statement about this is that our random variable has an expected value (calculated above based on probabilities and individual values) E(ξ).

## Friday, August 15, 2014

### Unizor - Probability - Random Variables - Introduction

Historically, the theory of probabilities was developed from attempts to study the games and analyze the chances of winning. The concept of random variables has the same roots. Not only people tried to evaluate the chances of winning, they also bet money on these games and wanted to understand how much to bet on this or that game. So, they were dealing with some numerical quantity associated with chances of winning or losing a game. This numerical quantity associated with the results (elementary events) of the game (random experiment) is an example of a random variable. The elementary event can be considered as a qualitative result of an experiment, while the random variable describes its quantitative result.

Consider an example of betting an amount of \$1 in a game of flipping a coin you play against a partner. Let's assume you are betting on "heads", then a coin is flipped. If you guessed correctly and a coin indeed falls on "heads", your partner pays you \$1. If you guess incorrectly, you give \$1 to him. This seems to be a fair game with equal chances to win and lose. Let's associate a numerical value - positive amount of winning or negative amount of losing - with each elementary event occurring in this game, that is introduce a function with numerical values defined on a set of elementary events. The value of this function for an elementary event "heads" equals to 1 and the value for an elementary event "tails" equals to −1. This is an example of a random variable.

Basically, any function defined on a set of elementary events that takes real values depending on results of certain random experiment is a random variable. Let's consider some examples.

Consider a game of roulette. A dealer spins a small ball on a wheel divided into partitions with 36 numbers from 1 to 36 and two additional partitions with 0 and 00 (American version). You can bet on any number from 1 to 36 (among other options which we do not consider for this example). Assume, you bet \$1 on number 23. If the ball stops on this number, you win and a dealer pays you \$36. If the ball stops on any other number, including 0 or 00, you lose you bet of \$1. The random variable thus is defined as having a value of 36 on the elementary event "Ball stops in a partition with number 23 in it" and having a value of −1 for all other elementary events.

Since in this course of theory of probabilities we consider only random experiments with finite number of elementary events, all our random variables are defined on the finite set of elements and take finite number of values. The values are always real numbers. The elementary events might or might not have equal chances to occur. Therefore, if there are N elementary events that our random experiment results in, there are N values (not necessarily different) of the random variable defined on this set of elementary events.

Assume, our sample space Ω consists of N elementary events E1, E2,..., EN. Generally speaking, they might not necessarily have equal chances to occur, so let's assume that the probability of occurrence of elementary event Ei is Pi (index i changes from 1 to N), where all these probabilities are not negative and, if added together, sum up to 1.
Random variable is a function with real values defined on each of the elementary events.
Let's use a Greek letter ξ to denote this random variable. Values taken by a random variable ξ depend on the results of the random experiment.
If the experiment results in E1, which can happen with a probability P1, the random variable ξ takes the value of X1.
If the experiment results in E2, which can happen with a probability P2, the random variable ξ takes the value of X2.
etc.
X1=ξ(E1),
X2=ξ(E2),
...
XN=ξ(EN).

So, we have a random variable ξ that takes different values Xi with different probabilities Pi, depending on the results of a random experiment. To analyze this random variable, we don't really need to think much about concrete elementary events (arguments of a function that we call the random variable) and concentrate only on probabilities of the values that our random variable takes. Thus, we can say that in the game of roulette with 38 partitions on a wheel (numbers from 1 to 36, 0 and 00), betting on number 23, we deal with a random variable that takes a value of 36 with the probability 1/38 (winning by correctly guessing the number where the ball stops spinning) and a values of −1 with the probability 37/38 (losing).

## Wednesday, August 13, 2014

### Unizor - Probability - Conditional - Problems 3

We recommend to attempt solving these problems prior to listening to the lecture or reading answers and proofs provided.
Also assume that all probabilities mentioned in the problems are not equal to zero, that is we are excluding impossible events.
Wherever possible, try to represent the problem graphically with a set of elementary events.
The usage of a word "random" assumes equal chances to occurrences of elementary events unless otherwise specified.

Problem 3.1.
We roll two dice. What is the probability that the sum of two rolled numbers equals to 7 if it's known that the first dice shows a number greater than 3?

Problem 3.2.
We roll two dice. What is the probability that the number on the second dice is greater than 3 if it's known that the numbers on both dice are different?

Problem 3.3.
A student is preparing to an exam. Exam includes 100 questions. A student had time to prepare only for the first 70 questions. During the exam students are randomly divided into 2 groups and original 100 questions were also divided among these groups: group 1 got questions from #1 to #50, group 2 got questions from #51 to #100.
Assume that our student can answer correctly any question he had prepared for (those from #1 to #70), but the one he had not prepared for he answers incorrectly.
He randomly picks a question from those given to his group.
(a) What is the probability of answering it correctly if it's known that he was selected into group 1?
(b) What is this probability on condition that he is selected into group 2?
(c) What is the total (unconditional) probability of answering the question correctly?

## Tuesday, August 12, 2014

### Unizor - Probability - Conditional - Problems 2

We recommend to attempt solving these problems prior to listening to the lecture or reading answers and proofs provided.
Also assume that all probabilities mentioned in the problems are not equal to zero, that is we are excluding impossible events.

Problem 2.1.
There are 3 white and 2 black socks in a box. You randomly pulled one sock and it happened to be white.
What is the probability of randomly pulling a second white sock?
Solve the problem in more than one way, try to use the concepts of conditional probability

1/2

Problem 2.2.
There is a computer game that has 2 levels. Experience shows that the probability of passing the level 1 during the first day equals to 50%=1/2. The probability of mastering both levels in the first day equals to 10%=1/10.
What is the probability of passing the level 2 during the first day for children who have managed to pass the level 1 on this day?

1/5

Problem 2.3.
There are three categories of people living in some place: 20% Jewish, 30% Muslims and 50% atheists.
The food restrictions among Jews and Muslims are similar, but not identical. Also, even among people of the same religion there are differences in interpretation of the laws.
Assume that 90% of Jewish people consider some food X as prohibited, while only 80% of Muslims agree with them. Atheists do not have any restrictions on food.
You invite a random person from a street for dinner. What is the probability that he would not eat the food X because he considers it prohibited?

0.42

## Monday, August 11, 2014

### Unizor - Probability - Conditional - Problems 1

We recommend to attempt to solve these problems prior to listening to the lecture or reading answers and proofs provided.
Also assume that all probabilities mentioned in the problems are not equal to zero, that is we are excluding impossible events.

Problem 1.1.
Let A and B be two independent events defined in the sample space Ω.
Prove that the product of the probability of both events to occur by the probability of neither event to occur equals to a product of the probability of occurrence of only event A (but not B) by the probability of occurrence of only event B
(but not A).
In symbolic form using the set theory operations, prove that
P(A∩B)·P[Ω∖ (A∪B)] =
P(A∖ B)·P(B∖ A)
or, equivalently,
P(A∩B)·P[NOT (A∪B)] =
P[A∩(NOT B)]·P[B∩(NOT A)]

Problem 1.2.
Prove "geometrically" (see above) and probabilistically that if event A is independent of event B then event NOT A is also independent of event B.

Problem 1.3.
Prove "geometrically" and probabilistically that if event A is independent of event B then event A is also independent of event NOT B.

Problem 1.4.
Prove "geometrically" and probabilistically the following identity
P(A|B) + P[(NOT A)|B)] = 1

Problem 1.5.
Is the following equality always true (in other words, is it an identity)?
P(A|B) + P[A|(NOT B)] = 1

## Thursday, August 7, 2014

### Unizor - Probability - Definition - Problems 1

There are 8 people in the room:
A - male, 19 years of old;
B - male, 18 years of old;
C - female, 18 years of old;
D - male, 20 years of old;
E - male, 24 years of old;
F - female, 28 years of old;
G - male, 33 years of old;
H - female, 16 years of old.
Our random experiment consists of selecting one of them.

(a) What are the set and its elements representing the result of this random experiment and its elementary events? What is the measure of probability of an entire sample space?
{A,B,C,D,E,F,G,H}; Pa=1

(b) What is the measure of probability allocated to each elementary event?
Pb=1/8

(c) What is the subset that represents the event "Randomly selected person is a female"? What is its measure of probability?
{C,F,H}; Pc=3/8

(d) What is the subset that represents the event "Randomly selected person is a male"? What is its measure of probability?
{A,B,D,E,G}; Pd=5/8

(e) What is the subset that represents the event "Randomly selected person is older than 20 years"? What is its measure of probability?
{E,F,G}; Pe=3/8

(f) What is the subset that represents the event "Randomly selected person is a female AND older than 20 year"? What is its measure of probability?
{C,F,H}∩{E,F,G}={F}; Pf=1/8

(g) What is the subset that represents the event "Randomly selected person is a person"? What is its measure of probability?
{A,B,C,D,E,F,G,H}; Pg=1

(h) What is the subset that represents the event "Randomly selected person is younger then 10 year old"? What is its measure of probability?
{}=∅; Ph=0

(i) What is the subset that represents the event "Randomly selected person is NOT younger then 10 year old"? What is its measure of probability?
{A,B,C,D,E,F,G,H}; Pi=1

(j) What is the subset that represents the event "Randomly selected person is NOT younger then 25 year old AND is NOT a male"? What is its measure of probability?
{F,G}∩{C,F,H}={F}; Pj=1/8

(k) What is the subset that represents the event "Randomly selected person is NOT younger then 25 year old OR is a male"? What is its measure of probability?
{F,G}∪{A,B,D,E,G}=
={A,B,D,E,F,G}; Pk=6/8=3/4

(l) What is the subset that represents the event "Randomly selected person is NOT a male OR NOT older than 20 years old"? What is its measure of probability?
{C,F,H}∪{A,B,C,D,H}=
={A,B,C,D,F,H}; Pl=6/8=3/4

(m) What is the subset that represents the event "Randomly selected person is NOT a male AND NOT older than 20 years old"? What is its measure of probability?
{C,F,H}∩{A,B,C,D,H}=
={C,H}; Pm=2/8=1/4

### Unizor - Probability - Conditional - Bayes Theorem

Let's start with the Bayes Theorem itself and its proof, and then discuss its applications. The theorem is very simple, but its applications are very far reaching.

Bayes Theorem
There are two events that are the results of a random experiment - A and B.
P(A) is the probability of event A to occur.
P(B) is the probability of event B to occur.
P(A|B) is the conditional probability of event A to occur if event B has already occurred.
P(B|A) is the conditional probability of event B to occur if event A has already occurred.
Then the following equality is true
P(A|B) = P(B|A) · P(A) / P(B)

Proof
By definition of conditional probability,
P(A|B) = P(A∩B) / P(B) and
P(B|A) = P(A∩B) / P(A).
Let's resolve the second equation for P(A∩B):
P(A∩B) = P(B|A) · P(A)
Now substitute this expression for P(A∩B) into a formula for P(A|B) above.
P(A|B) = P(B|A) · P(A) / P(B)
End of proof.

Consider a case when an entire sample space of elementary events is divided among two subsets with no common elements:
X and Y. Then any event W can be represented as a union of two non-intersecting parts W = (W·X)+(W·Y) and the measure of an event W is equal to a sum of measures of its two non-intersecting parts:
P(W) = P(W·X)+P(W·Y)
Using the formula of conditional probability P(A|B)=P(A·B)/P(B), we can rewrite the equation for P(W) above as
P(W) =
= P(X)·P(W|X) + P(Y)·P(W|Y)
This is called a formula of total probability. It is important here that events X and Y are complementary, that is
X·Y=∅ and X+Y=Ω.

Now we can use the Bayes formula to determine a conditional probability P(X|W) and P(Y|W):
P(X|W) = P(X·W)/P(W) = P(X)·P(W|X)/P(W) and
P(Y|W) = P(Y·W)/P(W) = P(Y)·P(W|Y)/P(W)

The formulas above show how, knowing probabilities of occurrence of certain conditions X and Y and the conditional probabilities of certain event W under these conditions, we can calculate its total probability P(W) and then determine which condition X and Y actually occurred with what conditional probability P(X|W) and P(Y|W) under a conditi the on that event W did occur.