Friday, January 22, 2016

Unizor - Statistics - Purpose

Unizor - Creative Minds through Art of Mathematics - Math4Teens

Notes to a video lecture on

Purpose of Statistics

Mathematical Statistics is, in some way, a subject with an inverse purpose relative to Theory of Probabilities.

Theory of Probabilities's purpose is to predict the future behavior based on known distribution of probabilities of a random variable. For example, knowing that the probability of rolling 1 on a dice equals to 1/6, we can predict that out of the next 1000 rolls number 1 will occur, approximately, 1000*(1/6)=167 times. We can even estimate possible deviations from this number.

Mathematical Statistics solves an inverse problem: knowing the results of an experiment in the past, determine the distribution of probabilities of a random variable. For example, if we rolled the dice 1000 times and number 1 occurred 160 times, we can evaluate the probability of this event to be somewhere around 160/1000=0.16.

Why have we decided that the probability of an event can be estimated as the ratio of its occurrence to a total number of experiments? That is based on the theorem proven in Theory of Probabilities as the The Law of Large Numbers with proper evaluation of the precision of such estimate based on the number of experiments.

We can say, therefore, that Mathematical Statistics and Theory of Probabilities are used together with the latter serving as a theoretical foundation of the practical information delivered by the former.

Unfortunately, in many cases people are not concerned about the theoretical foundation and make far fetching statements based on insufficient or wrongly interpreted data. Consider market research, weather forecasting, predictions of the results of presidential elections etc. All these activities are extremely important and that is why it is very important to always understand the limits, the precision and, ultimately, the validity of statistical results.

Here is a short example of the wrong interpretation of statistics. To predict the results of presidential elections in the United States, one company asked 100 people about who, in their opinion, would win - the Democrat or the Republican. It's got 60 responses in favor of Democrat and 40 in favor of Republican. So, it declared the victory for Democrat. Is that right?
Another company did the same and from 100 people received 60 responses in favor of Republican and 40 in favor of Democrat. So, it predicted a victory of Republican. Is that right?
They cannot be both right!
Somewhere there was a mistake. Where? Is there any way we can correct this mistake?

This and many other nuances accompany any statistical research. The purpose of studying Mathematical Statistic in this course is to learn how to do it right.

Before going any further, let's examine a simple probabilistic task. Can we say that the value of a random variable, obtained in a single experiment, is a good estimate of its mathematical expectation?
Obviously, it depends on probabilistic properties of our random variable. If it has a variance close to zero, the answer to this question is definitely "yes". However, if the variance of our variable is relatively large, as compared to its mathematical expectation, the answer is "no".

Let's assume that we conduct random experiments to observe the values of random variable ξ. Let's further assume that our experiments are independent and the probabilistic characteristics of our variable are not changing from one experiment to another.
Let's say, as a result of N experiments we have obtained values X1, X2...XN of our random variable. What can be done with these values to evaluate certain probabilistic characteristic of random variable ξ?

Here is a simple approach to evaluate the mathematical expectation of ξ.
Calculate an average of our N values from a series of N experiments with random variable ξ:
M = (X1+X2+...+XN) / N
Since each Xi is a value that our random variable ξ took in the i-th experiment, average M can be interpreted as a result of a single combined experiment that occurs when we observe a random variable
η = (ξ1+ξ2+...+ξN) / N
where each ξi is a random variable distributed exactly as our random variable ξ, and all ξi are independent from each other.

Let's investigate simple properties of random variable η.
Mathematical expectation of η is exactly the same as that of ξ because
E(η) = E[(ξ1+...+ξN) / N] =
= [E(ξ1)+...+E(ξN)] / N =
= [E(ξ)+...+E(ξ)] / N =
= N·E(ξ) / N = E(ξ)
Variance of η is N times smaller than that of ξ because
Var(η) = Var[(ξ1+...+ξN) / N] =
= [Var(ξ1)+...+Var(ξN)] / N² =
= [Var(ξ)+...+Var(ξ)] / N² =
= N·Var(ξ) / N² = Var(ξ) / N

As we see, random variable η has the same mathematical expectation as ξ, but its values are, generally, closer to this expectation since its variance is N times smaller (and, therefore, standard deviation is smaller by √N times).

No comments: