Unizor - Creative Mind through Art of Mathematics: Unizor - Statistical Distribution - Task D

Statistical Distribution
Problem 4 - Precipitation

To effectively learn from problem solving, try to solve these problems just by yourself, then listen to a lecture and then try to solve them again by yourself.

In order to determine if the claims about climate change are true or false, the monthly precipitation data at New York's Central Park have been gathered from the official Web site New York Historical Monthly Precipitation for the period from 1906 to 2015.
Based on these data, determine for each decade (10-year period) the mean, standard deviation and 95% certainty interval of the level of precipitation for each month of the year as well as for annualized levels.
Make a judgement about validity of the claims of New York climate change by comparing mean precipitation level in different months between two decades, 1906-1915 and 2006-2015 separated by a century. Consider a comparison of a difference between levels with zero.

Solution

The raw data with some additional calculations in a spreadsheet format can be downloaded from Monthly Precipitation NY (data).
This spreadsheet contains original data about monthly precipitation for each month and annualized precipitation during 1906-2015 period.
In addition, these data are supplemented with the following calculations for each month's and annualized for an entire period:
=MIN(...)
=MAX(...)
=SLOPE(...)
=AVERAGE(...)
=STDEV(...)
=VAR(...)
As we see, different months have different slopes during the entire period, some positive, some negative, some more, some less (the highest increase of precipitation is in April, the lowest - in January, the highest decrease in precipitation is in October, the lowest - in March). The annualized precipitation slope is positive, about 2 inches per century (about 4% growth in 100 years)
The degree of certainty of this conclusion is another issue, that we consider below.

The spreadsheet with original data and calculations per decade can be downloaded fromMonthly Temp NY (analysis).
Examine it. Here are some conclusions.

Let random variables ξ₁₉₀₆,ξ₁₉₀₇,...ξ₁₉₁₅, represent January precipitation during 1906-1915 years (calculations for other months and annualized precipitation are similar).
It is reasonable to assume that all of them are independent and have identical normal distribution with mathematical expectation μ_190# and varianceσ²_190#, which represent our mathematical model of the precipitation during these ten years.

Random variable
ξ₁ = (ξ₁₉₀₆+...+ξ₁₉₁₅)/10
represents average January precipitation during 1906-1915 decade. It's expectation isμ₁=μ_190# and its variance equals to σ²₁=σ²_190#/10.

Analogously, independent and identically distributed random variables ξ₂₀₀₆, ξ₂₀₀₇,...ξ₂₀₁₅, represent January precipitation during 2006-2015 years, each having mathematical expectation μ_200# and varianceσ²_200#, which represent our mathematical model of the precipitation during these ten years.

Random variable
ξ₂ = (ξ₂₀₀₆+...+ξ₂₀₁₅)/10
represents average January precipitation during 2006-2015 decade. It's expectation isμ₂=μ_200# and its variance equals to σ²₂=σ²_200#/10.

On the simplest level, to determine the validity of the claims about climate change during 20^th century, we can compare mathematical expectations
μ₁ = E(ξ₁) and μ₂ = E(ξ₂)
that is, mean precipitations in the beginning of 20^th and 21^stcenturies.
If they are different (or, which is the same, if μ₂−μ₁ ≠ 0), we have a confirmation of a shift in the precipitation. Based on this, we can calculate absolute and relative increase or decrease of precipitation during this 100 years period.

In the analysis spreadsheet mentioned above we have 10 sample values of January precipitation in each year of 1906-1915 decade
x₁₉₀₆, x₁₉₀₇,...x₁₉₁₅
and 10 sample values for January precipitation in each year of 2006-2015 decade
x₂₀₀₆, x₂₀₀₇,...x₂₀₁₅.
This gives one sample value for random variable ξ₁ (average precipitation during 1906-1915 period)
X₁ = (x₁₉₀₀+...+x₁₉₀₉)/10
and one sample value for random variable ξ₂ (average precipitation during 2006-2015)
X₂ = (x₂₀₀₆+...+x₂₀₁₅)/10
which we have to compare.

The difference X₂−X₁ is a single sample value of a normal random variable ξ₂−ξ₁, mathematical expectation of which (μ₂−μ₁) we want to compare with 0. This difference is the best possible estimate ofμ₂−μ₁. If we knew its variance, we could evaluate the range of possible values μ₂−μ₁ can take with whatever level of certainty we need.
For example, if σ² is a variance of ξ₂−ξ₁, we can say with 95% certainty that
|(μ₂−μ₁)−(X₂−X₁)| ≤ 2σ

Unfortunately, we don't know the variance of ξ₂−ξ₁.
We do, however, know that, since ξ₁ and ξ₂ are assumed to be independent random variables,
Var(ξ₂−ξ₁) = Var(ξ₂)+Var(ξ₁)

Now we have two ways to evaluate the variances of random variables ξ₁ and ξ₂.
We can assume that variance does not change with time and calculate a sample variance based on all January data from 1906 to 2015 - a questionable assumption, but a good estimate since the sample data are quite representative.
Alternatively, we calculate sample variances separately for 1906-1915 decade and 2006-2015 decade using only 10 sample values for each - better assumption, but sample data are not as numerous.

Let's use both methods and compare the results.

Assuming the variance does not change with time, the calculations based on entire set of data from 1906 to 2015 show the sample standard deviation of January precipitation to beσ=1.645 and the sample variance to be σ²=2.707.
The sample variance of the decade averages ξ₁ or ξ₂ is2.707/10=0.271.
The difference ξ₂−ξ₁ has sample variance 0.271·2=0.541 and standard deviation (square root of variance) is 0.736.

With this standard deviation and the sample average January precipitation during 1906-1915 and 2006-2015 periods, correspondingly, 3.911 and3.547, we can state with 95% certainty that
|(μ₂−μ₁)−(3.547−3.911)| ≤
≤ 2·0.736
That is,
−0.364−1.472 ≤ μ₂−μ₁ ≤
≤ −0.364+1.472
or
−1.836 ≤ μ₂−μ₁ ≤ 1.108
As we see, we cannot say with 95% certainty that there is a positive or negative movement of the decade average January precipitation from 1900's to 2000's in New York.
Even if we reduce the level of certainty level to 68% (single sigma rule), still our left margin would be negative and right margin will be positive, which means that we cannot say there is a difference between mean January precipitations in 1900's and 2000's.

The same approach applied to other monthly and annualized precipitations produces different results. Here is a list of conclusions that we can make about validity of the statistically significant (with 95% certainty) change in precipitation in New York from 1900's to 2000's
Jan - NO
Feb - NO
Mar - NO
Apr - NO
May - NO
Jun - YES (increase)
Jul - NO
Aug - NO
Sep - NO
Oct - NO
Nov - NO
Dec - NO
Year - YES (increase)

Let's try a different approach and consider that variances do change with the time. Then our only choice is to evaluate separately variance of the January precipitation during 1906-1915 and during 2006-2015 periods based only on 10 values available during each decade.
Sample variance of the January precipitation during 1906-1915 is
Var(ξ₁₉₀₆) = 2.816.
Sample variance of the average January precipitation during this period is
Var(ξ₁) = 2.816/10 = 0.282.
Sample variance of the January precipitation during 2006-2015 is
Var(ξ₂₀₀₆) = 1.232.
Sample variance of the average January precipitation during this period is
Var(ξ₂) = 1.232/10 = 0.123.
Sample variance of the difference between average decade precipitations is
Var(ξ₂−ξ₁) = 0.282+0.123 = 0.405.
Standard deviation (square root from the above variance) is
σ = 0.636 (not much different from the approach when we calculate this value based on entire population, σ=0.736).
Finally, 2σ interval around sample mean is: −0.364−1.273 ≤ μ₂−μ₁ ≤
≤ −0.364+1.273
or
−1.637 ≤ μ₂−μ₁ ≤ 0.909
The conclusion about absence of average January precipitation trend in New York is the same.
The same approach for other months and annualized precipitation produce different results. Here is a list of conclusions that we can make about validity of the change in precipitation in New York from 1900's to 2000's
Jan - NO
Feb - NO
Mar - NO
Apr - NO
May - NO
Jun - YES (increase)
Jul - NO
Aug - NO
Sep - NO
Oct - NO
Nov - NO
Dec - YES (increase)
Year - YES (increase)

As we see, this second approach to evaluate the variance Var(ξ₂−ξ₁) produced almost the same result as the first one, the difference is only for the month of December.

CONCLUSION
We can state with 95% certainty that decade average monthly precipitation during 2000's is higher than the corresponding decade average monthly precipitation during 1900's in June. December produced two different results in two methodologies we used. Annualized decade average precipitation is also higher during 2000's than in 1900's.
The slope upward of about 2" in annual precipitation per century is observed.
Obviously, we cannot state with any mathematical precision which factors contributed more or less to this increase. Politicians are making their careers fighting over these issues, which is completely outside of this presentation.

Unizor - Creative Mind through Art of Mathematics

Tuesday, June 14, 2016

Unizor - Statistical Distribution - Task D - Precipitation

No comments:

Facebook Badge

Blog Archive

About Me