Tuesday, June 14, 2016

Unizor - Statistical Distribution - Task D - Precipitation





Statistical Distribution
Problem 4 - Precipitation


To effectively learn from problem solving, try to solve these problems just by yourself, then listen to a lecture and then try to solve them again by yourself.

In order to determine if the claims about climate change are true or false, the monthly precipitation data at New York's Central Park have been gathered from the official Web site New York Historical Monthly Precipitation for the period from 1906 to 2015.
Based on these data, determine for each decade (10-year period) the mean, standard deviation and 95% certainty interval of the level of precipitation for each month of the year as well as for annualized levels.
Make a judgement about validity of the claims of New York climate change by comparing mean precipitation level in different months between two decades, 1906-1915 and 2006-2015 separated by a century. Consider a comparison of a difference between levels with zero.

Solution

The raw data with some additional calculations in a spreadsheet format can be downloaded from Monthly Precipitation NY (data).
This spreadsheet contains original data about monthly precipitation for each month and annualized precipitation during 1906-2015 period.
In addition, these data are supplemented with the following calculations for each month's and annualized for an entire period:
=MIN(...)
=MAX(...)
=SLOPE(...)
=AVERAGE(...)
=STDEV(...)
=VAR(...)
As we see, different months have different slopes during the entire period, some positive, some negative, some more, some less (the highest increase of precipitation is in April, the lowest - in January, the highest decrease in precipitation is in October, the lowest - in March). The annualized precipitation slope is positive, about 2 inches per century (about 4% growth in 100 years)
The degree of certainty of this conclusion is another issue, that we consider below.

The spreadsheet with original data and calculations per decade can be downloaded fromMonthly Temp NY (analysis).
Examine it. Here are some conclusions.

Let random variables ξ1906,ξ1907,...ξ1915, represent January precipitation during 1906-1915 years (calculations for other months and annualized precipitation are similar).
It is reasonable to assume that all of them are independent and have identical normal distribution with mathematical expectation μ190# and varianceσ²190#, which represent our mathematical model of the precipitation during these ten years.

Random variable
ξ1 = (ξ1906+...+ξ1915)/10
represents average January precipitation during 1906-1915 decade. It's expectation isμ1190# and its variance equals to σ²1=σ²190#/10.

Analogously, independent and identically distributed random variables ξ2006ξ2007,...ξ2015, represent January precipitation during 2006-2015 years, each having mathematical expectation μ200# and varianceσ²200#, which represent our mathematical model of the precipitation during these ten years.

Random variable
ξ2 = (ξ2006+...+ξ2015)/10
represents average January precipitation during 2006-2015 decade. It's expectation isμ2200# and its variance equals to σ²2=σ²200#/10.

On the simplest level, to determine the validity of the claims about climate change during 20th century, we can compare mathematical expectations
μ1 = E(ξ1) and μ2 = E(ξ2)
that is, mean precipitations in the beginning of 20th and 21stcenturies.
If they are different (or, which is the same, if μ2−μ1 ≠ 0), we have a confirmation of a shift in the precipitation. Based on this, we can calculate absolute and relative increase or decrease of precipitation during this 100 years period.

In the analysis spreadsheet mentioned above we have 10 sample values of January precipitation in each year of 1906-1915 decade
x1906x1907,...x1915
and 10 sample values for January precipitation in each year of 2006-2015 decade
x2006x2007,...x2015.
This gives one sample value for random variable ξ1 (average precipitation during 1906-1915 period)
X1 = (x1900+...+x1909)/10
and one sample value for random variable ξ2 (average precipitation during 2006-2015)
X2 = (x2006+...+x2015)/10
which we have to compare.

The difference X2−X1 is a single sample value of a normal random variable ξ2−ξ1, mathematical expectation of which (μ2−μ1) we want to compare with 0. This difference is the best possible estimate ofμ2−μ1. If we knew its variance, we could evaluate the range of possible values μ2−μ1 can take with whatever level of certainty we need.
For example, if σ² is a variance of ξ2−ξ1, we can say with 95% certainty that
|2−μ1)−(X2−X1)| ≤ 

Unfortunately, we don't know the variance of ξ2−ξ1.
We do, however, know that, since ξ1 and ξ2 are assumed to be independent random variables,
Var(ξ2−ξ1) = Var(ξ2)+Var(ξ1)

Now we have two ways to evaluate the variances of random variables ξ1 and ξ2.
We can assume that variance does not change with time and calculate a sample variance based on all January data from 1906 to 2015 - a questionable assumption, but a good estimate since the sample data are quite representative.
Alternatively, we calculate sample variances separately for 1906-1915 decade and 2006-2015 decade using only 10 sample values for each - better assumption, but sample data are not as numerous.

Let's use both methods and compare the results.

Assuming the variance does not change with time, the calculations based on entire set of data from 1906 to 2015 show the sample standard deviation of January precipitation to beσ=1.645 and the sample variance to be σ²=2.707.
The sample variance of the decade averages ξ1 or ξ2 is2.707/10=0.271.
The difference ξ2−ξ1 has sample variance 0.271·2=0.541 and standard deviation (square root of variance) is 0.736.

With this standard deviation and the sample average January precipitation during 1906-1915 and 2006-2015 periods, correspondingly, 3.911 and3.547, we can state with 95% certainty that
|2−μ1)−(3.547−3.911)| ≤
≤ 2·0.736
That is,
−0.364−1.472 ≤ μ2−μ1 ≤
≤ −0.364+1.472

or
−1.836 ≤ μ2−μ1 ≤ 1.108
As we see, we cannot say with 95% certainty that there is a positive or negative movement of the decade average January precipitation from 1900's to 2000's in New York.
Even if we reduce the level of certainty level to 68% (single sigma rule), still our left margin would be negative and right margin will be positive, which means that we cannot say there is a difference between mean January precipitations in 1900's and 2000's.

The same approach applied to other monthly and annualized precipitations produces different results. Here is a list of conclusions that we can make about validity of the statistically significant (with 95% certainty) change in precipitation in New York from 1900's to 2000's
Jan - NO
Feb - NO
Mar - NO
Apr - NO
May - NO
Jun - YES (increase)
Jul - NO
Aug - NO
Sep - NO
Oct - NO
Nov - NO
Dec - NO
Year - YES (increase)

Let's try a different approach and consider that variances do change with the time. Then our only choice is to evaluate separately variance of the January precipitation during 1906-1915 and during 2006-2015 periods based only on 10 values available during each decade.
Sample variance of the January precipitation during 1906-1915 is
Var(ξ1906) = 2.816.
Sample variance of the average January precipitation during this period is
Var(ξ1) = 2.816/10 = 0.282.
Sample variance of the January precipitation during 2006-2015 is
Var(ξ2006) = 1.232.
Sample variance of the average January precipitation during this period is
Var(ξ2) = 1.232/10 = 0.123.
Sample variance of the difference between average decade precipitations is
Var(ξ2−ξ1) = 0.282+0.123 = 0.405.
Standard deviation (square root from the above variance) is
σ = 0.636 (not much different from the approach when we calculate this value based on entire population, σ=0.736).
Finally,  interval around sample mean is: −0.364−1.273 ≤ μ2−μ1 ≤
≤ −0.364+1.273

or
−1.637 ≤ μ2−μ1 ≤ 0.909
The conclusion about absence of average January precipitation trend in New York is the same.
The same approach for other months and annualized precipitation produce different results. Here is a list of conclusions that we can make about validity of the change in precipitation in New York from 1900's to 2000's
Jan - NO
Feb - NO
Mar - NO
Apr - NO
May - NO
Jun - YES (increase)
Jul - NO
Aug - NO
Sep - NO
Oct - NO
Nov - NO
Dec - YES (increase)
Year - YES (increase)

As we see, this second approach to evaluate the variance Var(ξ2−ξ1) produced almost the same result as the first one, the difference is only for the month of December.

CONCLUSION
We can state with 95% certainty that decade average monthly precipitation during 2000's is higher than the corresponding decade average monthly precipitation during 1900's in June. December produced two different results in two methodologies we used. Annualized decade average precipitation is also higher during 2000's than in 1900's.
The slope upward of about 2" in annual precipitation per century is observed.
Obviously, we cannot state with any mathematical precision which factors contributed more or less to this increase. Politicians are making their careers fighting over these issues, which is completely outside of this presentation.

No comments: