## Monday, November 24, 2014

### Unizor - Probability - Normal Distribution - Using Sigma

Consider a game of winning against a house a unit of currency 1 bitcoin or losing to a house 1 bitcoin with equal probabilities of 1/2.
Now you decided to play a series of 100 games. How much money do you have to have in reserve to be able to pay the balance in case you are at loss after this series of games?

The simplest answer is to have 100 bitcoins in reserve to be able to pay for 100 losses in a row. In theory, if you want to assure with the probability of 100% that you will be able to pay, this is the only answer. But, at the same time, you might think that it would be good enough to assure the ability to pay with a relatively high probability, accepting a small risk to lose your good standing with the house. How much do you have to have in reserve then? Could you substantially reduce the amount of money required then?

The normal distribution and sigma limits can be of help in this case.

An individual game can be modeled as a Bernoulli random variable ξ taking values +1 and −1 with equal probabilities of 1/2.
The result of a series of 100 games is a sum of 100 such variables:
η = ξ1+ξ2+...+ξ100.
All members of this sum are independent and identically distributed according to Bernoulli distribution taking values +1 and −1 with equal probabilities of 1/2.

To evaluate the number of bitcoins you have to have to be able to pay off your losses with a relatively high probability, you have to establish your risk limits. For instance, you decide that if the probability of losing greater than S bitcoins is less than 0.025, it's good enough for you.
To find S (positive amount of bitcoins to have in reserve), we have to satisfy the inequality P(η ≤ −S) ≤ 0.025.

This is not a simple task. You have to find the probabilities for K=−100, K=−98, K=−96 etc. Summarize them one by one until their sum exceeds 0.025. The last value of K will be the maximum loss you can pay, ignoring all greater losses since their accumulated probability does not exceed 0.025.

As you see, this process is tedious, cumbersome, long, time consuming etc.

There is better way!

As we know, the distribution of a sum of many independent identically distributed random variables resembles the normal distribution. Which one? The one with an expectation equal to a sum of expectations of each component of a sum and a variance equal to a sum of their variances.

In our case the expectation of each Bernoulli random variable equals to
E(ξ)=(1)·(1/2)+(−1)·(1/2)=0
Its variance is
Var(ξ) =
=(1−0)2·(1/2)+(−1−0)2·(1/2)=1
Therefore, expectation and variance of a sum of 100 such variables are:
E(η) = 100·E(ξ) = 0
Var(η) = 100·Var(ξ) = 100

Let's assume now that random variable η is truly normal. Obviously, this assumption must be quantified, but we are looking for approximate values, so, for our purposes, it's quite valid assumption. Then we can use the sigma limits for a normal random variable. Knowing variance, we calculate the standard deviation as a square root from variance:
σ(η) = 10
And now we are ready to evaluate the probability of the sum of 100 results of the Bernoulli experiments based on sigma limit.

Recall the sigma limits for a normal random variable with expectation μ and standard deviation σ:
1. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−σ, μ+σ] (the narrowest interval of these three) approximately equals to 0.6827.

2. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−2σ, μ+2σ] (the wider interval) approximately equals to 0.9545.

3. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−3σ, μ+3σ] (the widest interval of these three) approximately equals to 0.9973.

Consider the 2σ limit and apply it for our case of μ=0 and σ=10.
Since the probability of a random variable to be in the interval 2σ, that is in the interval [−20,20], equals to 0.9545, the probability to fall outside of this interval (less than −20 or greater than 20) equals to 0.0455. Since normal distribution is symmetrical, the probability to be less than −20 equals to the probability to be greater than 20 and equals to half of that value, that is, approximately, 0.0227.

Therefore, we can say that with the probability of about 0.0227 we will not lose more than 20 bitcoins. As you see, the calculations are simple and, accepting a little risk (with probability 0.0227), we can reduce the required amount of reserve from 100 bitcoins for non-risk strategy, to only 20 bitcoins.

## Friday, November 21, 2014

### Unizor - Probability - Normal Distribution - Sigma Limits

As we know, the normal distribution of a random variable (or the distribution of probabilities for a normal random variable) is defined by two parameters:
expectation (or mean) μ and
standard deviation σ.

The expectation defines the center of a bell curve that represents the distribution of probabilities.
The standard deviation defines the steepness of this curve around this center - smaller σ corresponds to a steeper central part of a curve, which means that values around μ are more probable.

Our task is to evaluate the probabilities of the normal variable to take values within certain interval around its mean μ based on the value of its standard deviation σ.

Consider we have these two parameters, μ and σ, given and, based on their values, we have constructed a bell curve that represents the distribution of probabilities of a normal variable with these parameters. Let's choose some positive constant d and mark three points, A, M and B, on the X-axis with coordinates, correspondingly, μ−d, μ and μ+d.
Point M(μ) is at the center of our bell curve.
Points A(μ−d) and B(μ+d) are on both sides from a center on equal distance d from it.

The area under the entire bell curve equals to 1 and represents the probability of our normal random variable to take any value.
The area under the bell curve restricted by a point A on the left and a point B on the right represents the probability of our random variable to take value in the interval AB.
We have specifically chosen points A and B symmetrical relatively to a midpoint M because the bell curve has this symmetry.

It is obvious that the wider interval AB is - the greater the probability of our random variable to take a value within this interval. Since the area under the bell curve restricted by points A and B around the center M depends only on its width (defined by the d constant) and the steepness of a curve, let's measure the width using the same parameter that defines the steepness, the standard deviation σ. This will allow us to evaluate probabilities of a normal random variable to take values within certain interval based only on one parameter - its standard deviation σ.

Traditionally, there are three different intervals around the mean value μ considered to evaluate the values of normal random variable:
d=σ, d=2σ and d=3σ.
Let's quantify them all.

1. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−σ, μ+σ] (the narrowest interval of these three) approximately equals to 0.6827.

2. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−2σ, μ+2σ] (the wider interval) approximately equals to 0.9545.

3. For a normal random variable with mean μ and standard deviation σ the probability of having a value in the interval [μ−3σ, μ+3σ] (the widest interval of these three) approximately equals to 0.9973.

As you see, the value of a normal variable can be predicted with the greatest probability when choose the widest interval of the three mentioned - the 3σ-interval around its mean. The value will fall into this interval with a very high probability.

Narrower 2σ-interval still maintains a relatively high probability to have a value of our random variable fallen into it.

The narrowest σ-interval has this probability not much higher than 0.5, which makes the prediction for the value of our random variable to fall into it not very reliable.

## Tuesday, November 18, 2014

### Unizor - Probability - Random Variables - Problems 3

Problem 3.1.
Given a random variable. Then a new random variable is formed as a product of a constant by this random variable.
Prove that an expectation (mean) of a product of a constant by a random variable equals to a product of that constant by an expectation of that random variable:
E(a·ξ) = a·E(ξ)

Proof
Recall that an expectation of a random variable is a weighted average of its values with corresponding probabilities as weights.
Assume that random variable ξ takes values x1, x2,...xN with probabilities, correspondingly, p1, p2,...pN.
Then
E(ξ) = x1·p1 + x2·p2 +...+ xN·pN
Random variable a·ξ takes values a·x1, a·x2,...a·xN with probabilities, correspondingly, p1, p2,...pN.
Then
E(a·ξ) = a·x1·p1+a·x2·p2+...+a·xN·pN =
= a·(x1·p1 + x2·p2 +...+ xN·pN) = a·E(ξ)

Problem 3.2.
Given a random variable. Then a new random variable is formed as a product of a constant by this random variable.
Prove that a variance of a product of a constant by a random variable equals to a product of a square of that constant by a variance of that random variable:
Var(a·ξ) = a^2·Var(ξ)

Proof
Recall that a variance of a random variable is an expectation of a square of deviation of this random variable from its expected value (mean). As a formula, it looks like this:
Var(ξ) = E{[(ξ−E(ξ)]^2}
Assume that random variable ξ takes values x1, x2,...xN with probabilities, correspondingly, p1, p2,...pN.
Let the expectation (mean) of this random variable be E(ξ)=μ.
Then
Var(ξ) = (x1−μ)^2·p1 + (x2−μ)^2·p2 +...+ (xN−μ)^2·pN
Random variable a·ξ takes values a·x1, a·x2,...a·xN with probabilities, correspondingly, p1, p2,...pN.
As we know from a previous problem,
E(a·ξ) = a·E(ξ) = a·μ
Then
Var(a·ξ) = (a·x1−a·μ)^2·p1 + (a·x2−a·μ)^2·p2+...+ (a·xN−a·μ)^2·pN
= a^2·(x1−μ)^2·p1 + a^2·(x2−μ)^2·p2 +...+ a^2·(xN−μ)^2·pN =
= a^2·Var(ξ)

Problem 3.3.
Given a random variable. Then a new random variable is formed as a product of a constant by this random variable.
Prove that a standard deviation of a product of a constant by a random variable equals to a product of an absolute value of that constant by a standard deviation of that random variable:
σ(a·ξ) = |a|·σ(ξ)

Proof
Recall that a standard deviation of a random variable is a (non-negative) square root of its variance.
Therefore,
σ(a·ξ) = √Var(a·ξ) = √a2·Var(ξ) = |a|·√Var(ξ) = |a|·σ(ξ)
Notice the absolute value around a constant a. Even if this constant is negative, a standard deviation is always non-negative, as is a variance of a random variable.

Problem 3.4.
Given N independent identically distributed random variables ξ1, ξ2,...ξN, each having the expectation (mean) μ and the standard deviation σ.
Calculate the expectation, variance and standard deviation of their average
η = (ξ1 + ξ2 +...+ ξN)/N

E(η) = μ
Var(η) = σ2/N
StdDev(η) = σ/√N

Hint
Use the additive property of an expectation relative to a sum of any random variables and an additive property of a variance relative to a sum of independent random variables.

IMPORTANT CONSEQUENCE

Averaging of independent identically distributed random variables produces a new random variable with the same mean (expectation) as the mean of original random variables, but the standard deviation of that average from its mean is decreasing by a factor of √N as the number N of random variables participating in the averaging in increasing.

Therefore, the process of averaging produces more and more precise estimate of the expectation of a random variable as the number of random variable grows. This is a basis of most statistical calculations.

For example, if one person measures a length of some object, there is always an error of his measuring. His result is a true length plus some error introduced by his measurement (positive or negative). So, we can consider his measure as a random variable with certain mean - a true length of an object and a random error with some standard deviation that depends on his measurement tool and accuracy. Thus, if measure with a ruler the length of a table with a true length of 1 meter, we can say that the error might be no greater then 1 centimeter, and our results will be from 99 to 101 centimeters.

But if this person measures the length of this object 100 times and the results of these measurements are averaged, the final result will be much closer to the real value of the length of the object since the mean of this average is the same as the mean of every measurement, that is the true length of an object, but the standard deviation around this true length will be √100=10 times smaller than that of each measurement. That assures the precision of our evaluation of the length of the table to be within 1 millimeter from the real length.

### Unizor - Probability - Normal Distribution - Normal is the Limit

Let's start with formulating again the Central Limit Theorem of the theory of probabilities that illustrates the importance of the Normal distribution of probabilities.

In non-rigorous terms, the Central Limit Theorem states that, given certain conditions, the average of a large number of random variables behaves approximately like a normal random variable.

One of the simplest sufficient conditions, for example, is the requirement about random variables participating in averaging to be independent, identically distributed with finite expectation and variance.
Throughout the history of development of the theory of probabilities the Central Limit Theorem was proven for weaker and weaker conditions. Even if individual variables are not completely independent, even if their probability distributions are not the same, the theorem can still be proven. Arguably, the history of the development of the theory of probabilities is the history of proving the Central Limit Theorem under weaker and weaker conditions.

Rigorous proof of this theorem, even in the simplest case of averaging independent identically distributed random variables, is outside of the scope of this course. However, the illustrative examples are always possible.

Consider a sequence of N independent Bernoulli experiments with a probability of success equal to 1/2. Their results are the random variables ξi, each taking two values, 0 and 1, with equal probability of 1/2 (index i is in the range from 1 to N).

Now consider their sum
η = ξ1+ξ2+...+ξN

From the lecture about Bernoulli distribution we know that random variable η can take any value from 0 to N with the probability to take a value K (K is in the range from 0 to N) equal to
P(η=K) = CNK·pK·qN−K
where p is the probability of SUCCESS and q=1−p is the probability of FAILURE.

Let's graph the distribution of probabilities of our random variable η for different values of N with probabilities p = q = 1/2.

N=1: η = ξ1
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/2 on [0,1],
1/2 on [1,2] and
zero after x=2.

N=2: η = ξ1+ξ2
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/4 on [0,1],
1/2 on [1,2],
1/4 on [2,3] and
zero after x=3.

N=3: η = ξ1+ξ2+ξ3
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/8 on [0,1],
3/8 on [1,2],
3/8 on [2,3],
1/8 on [3,4] and
zero after x=4.

N=4: η = ξ1+ξ2+ξ3+ξ4
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/16 on [0,1],
4/16 on [1,2],
6/16 on [2,3],
4/16 on [3,4],
1/16 on [4,5] and
zero after x=5.

N=5: η = ξ1+ξ2+ξ3+ξ4+ξ5
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/32 on [0,1],
5/32 on [1,2],
10/32 on [2,3],
10/32 on [3,4],
5/32 on [4,5],
1/32 on [5,6] and
zero after x=6.

N=6: η = ξ1+ξ2+ξ3+ξ4+ξ5+ξ6
The graph of the distribution of probabilities of η is
zero to the left of x=0,
1/64 on [0,1],
6/64 on [1,2],
15/64 on [2,3],
20/64 on [3,4],
15/64 on [4,5],
6/64 on [5,6],
1/64 on [6,7] and
zero after x=7.

The above graphs obviously more and more resemble the bell curve. When squeezed by a factor of N (to transform a sum of N random variables into their average), all the graphs will be greater than zero only in a segment [0,1] and inside this segment, as N grows, the graphs will be closer and closer to a bell curve.

## Monday, November 10, 2014

### Unizor - Probability - Normal Distribution

In some way the Normal distribution of probabilities is the most important one in an entire Theory of Probabilities and Mathematical Statistics.

Everybody knows about the bell curve. This is a graphical representation of the Normal distribution of probabilities, where the probability of a random variable to be between the values A and B is measured as an area under a bell curve from an abscissa A to an abscissa B.
Not all bell-shaped distributions of probabilities are Normal, there are many others, but all Normal distributions do have a bell-like graphic representation with different position of a center and different steepness of the curve.
Of course, the entire area under a bell curve from negative infinity to positive infinity equals to 1 since this is a probability of a random variable to have any value.

Normal distribution is a continuous type of distribution. Random variable with this distribution of probabilities (we will call it sometimes normal random variable) can take any real value. The probability of normal random variable to take any exact specific value equals to zero, while the probability it takes a value in the interval from one real number (left boundary) to another (right boundary) is greater than zero.

Exactly in the middle of a graph is the expected value (mean) of our random variable. Variance, being a measure of deviation of a random variable from its mean value, depends on how steep the graph increases towards its middle. The steeper the bell-shaped graph around its middle - the more concentrated the values of a normal random variable are around its mean value and, therefore, the smaller variance and standard deviation it has.

What makes a normal random variable so special?

In short (and not very mathematically speaking), average of many random variables of almost any distributions of probabilities behaves very much like a normal random variable.

Obviously, this property of an average of random variables is conditional. A necessary condition is that the number of random variables participating in an average should be large, the more random variables are averaged together - the more the distribution of an average resembles normal distribution.
More precisely, this is a limit theorem, which means that the measure of "normality" of a distribution of an average of random variables is increasing with the number of participating random variables increasing to infinity, and the limit of a distribution of this average is exactly the normal distribution of probabilities.
The precise mathematical theorem that states this property of an average of random variables is called the Central Limit Theorem in the theory of probabilities.

There are other conditions for this theorem to hold. For instance, independence and identical distribution of random variables participating in the averaging is a sufficient (but not necessary) condition.