Green Friday - Is climate change real?

1. Climate change: reality or fake news?

Everyone is talking about climate change, and many are doubting that it really exists… But climate is one of the best documented scientific area, and there are tons of data available, we just need to crunch the numbers with the right tools!

So let us use the german weather records, and see if the effects of climate change in Germany (temperatures, rain, …) can be seen in the data. We will use the data from the deutscher Wetterdienst (DWD) which is made available using a R library rdwd (see this page). This library contains functions which allow to query the huge database of the DWD.

2. Lets get started

a. accessing the data

First, we need to install some packages:

We can query the database using the name of a weather station (e.g. Potsdam), and collect a time series of various wheather variables. This time series is available for difference time intervalls, for example daily or monthly. If you want to use another location, you can check this interactive map, and look for the blue dots.

Have a look at the downloaded table to see what kind of variables are available:

  STATIONS_ID MESS_DATUM QN_3 FX.Windspitze FM.Windgeschwindigkeit QN_4
1        3987 2018-05-27   10           9.2                    3.2    3
2        3987 2018-05-28   10          10.0                    4.5    3
3        3987 2018-05-29   10          12.5                    5.2    3
4        3987 2018-05-30   10          10.8                    3.6    3
5        3987 2018-05-31   10           8.1                    3.3    3
  RSK.Niederschlagshoehe RSKF.Niederschlagsform SDK.Sonnenscheindauer
1                    0.4                      6                10.250
2                    0.0                      0                13.433
3                    0.0                      0                15.383
4                    0.0                      0                10.317
5                    0.0                      0                11.733
  SHK_TAG.Schneehoehe NM.Bedeckungsgrad VPM.Dampfdruck PM.Luftdruck
1                   0               4.3           15.7      1008.94
2                   0               4.5           16.0      1007.71
3                   0               0.8           13.3      1004.02
4                   0               3.3           15.6      1002.91
5                   0               3.2           16.6      1003.17
  TMK.Lufttemperatur UPM.Relative_Feuchte TXK.Lufttemperatur_Max
1               20.5                67.58                   28.4
2               24.3                56.46                   32.2
3               25.6                43.17                   32.6
4               24.1                53.42                   33.2
5               24.4                57.63                   31.9
  TNK.Lufttemperatur_Min TGK.Lufttemperatur_5cm_min eor
1                   13.8                       11.4 eor
2                   15.8                       13.1 eor
3                   18.2                       14.9 eor
4                   18.2                       15.1 eor
5                   17.1                       14.4 eor

b. comparing variables

We have downloaded a recent dataset, containing the data for the last 2 years; we can have a look at the time series for a certain variable, for example TMK.Lufttemperatur: we will plot the data with time as the x-axis, and temperature in the y-axis

try to plot the sunshine duration for the same period

This seems to be tighly correlated to the temperature!

We can look for correlations between temperature and sunshine duration

[1] 0.6359194

c. comparing location

Is Freiburg warmer than Potsdam? Let us get the data from these 2 locations and compare the monthly temperatures over the last 2 years:

More rain? More heat?

Are these differences statistically significant? Or could it be simply due to statistical fluctuations in this time period? Since the time points match between the 2 locations, we can perform a paired t-test:


    Paired t-test

data:  rain.fr and rain.po
t = 2.789, df = 17, p-value = 0.01259
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  7.056499 50.899056
sample estimates:
mean of the differences 
               28.97778 

    Paired t-test

data:  sun.fr and sun.po
t = -1.0272, df = 18, p-value = 0.318
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -40.70075  13.97128
sample estimates:
mean of the differences 
              -13.36474 

    Paired t-test

data:  temp.fr and temp.po
t = 1.0981, df = 18, p-value = 0.2866
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2441508  0.7788877
sample estimates:
mean of the differences 
              0.2673684 

For which of these variables is the difference significant?

3. Going back in time…

Climate change happens at a larger time scale than 2 years. Hence, we need to download historical data! We will use monthly intervals:

a. looking at distributions

Let us explore the distribution of sunshine duration in one month of the year (for example April) accross many years: April will be encoded as <year>-04-15 in the column MESS_DATUM of the clim table. We need to find the rows of the table for which the column MESS_DATUM contains the pattern xxx-04-15:

Let us plot the distribution of these values as a histogram:

Compute the mean and standard deviation of these April temperatures:

We can compare this histogram with a “theoretical” normal distribution with same mean and standard deviation:

Now overlay the histogram with the theoretical distribution:

We can also compare the histogram to a normal distribution using a QQ-plot:

Kind of…

looking at time series

Like we did for the recent data, we can now look at the monthly temperatures for April on a large time range.

Do we see a tendency? Fake news? Try to overlay onto this plot the temperature profile for July

Is there a correlation between time and temperature increase? We can encode time as a numerical vector (1,2,…) and compute a Spearman correlation between this time vector and the temperatures:

[1] 0.3427779

Damn … Das war’s mit Lars!!

Can you determine for which month of the year this correlation is highest?

4. Is the trend significant?

Let us compare the April temperature of the years 1900-1918 and 2000-2018:

Let us visualize the data as a boxplot:

There is obviously a difference in these 2 distributions; but is this difference really statistically significant?


    Welch Two Sample t-test

data:  temp.19 and temp.20
t = -4.6079, df = 35.689, p-value = 5.026e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.653731 -1.419953
sample estimates:
mean of x mean of y 
 7.648421 10.185263 

what kind of test was performed? Interpret the output of the test! Try performing a single-sided t-test

Can you see a similar effect for other climate variables (rain, wind,…)?

5. Additional analysis

  • redo this analysis for other climate variables, such as rain or sunshine duration. Do you also see a temporal trend?

6. Further data

  • Check on Kaggle for related datasets with climate data: here is a list
  • Additional R packages and tools here

Ashwini Kumar Sharma, Carl Herrmann

2019-11-29