Green Friday - Is climate change real?
1. Climate change: reality or fake news?
Everyone is talking about climate change, and many are doubting that it really exists… But climate is one of the best documented scientific area, and there are tons of data available, we just need to crunch the numbers with the right tools!
So let us use the german weather records, and see if the effects of climate change in Germany (temperatures, rain, …) can be seen in the data. We will use the data from the deutscher Wetterdienst (DWD) which is made available using a R library rdwd
(see this page). This library contains functions which allow to query the huge database of the DWD.
2. Lets get started
a. accessing the data
First, we need to install some packages:
## install the package
#install.packages("rdwd") # You might need to install RTools to build the rdwd package; see https://cran.r-project.org/bin/windows/Rtools/
## load the package
library(rdwd)
We can query the database using the name of a weather station (e.g. Potsdam
), and collect a time series of various wheather variables. This time series is available for difference time intervalls, for example daily
or monthly
. If you want to use another location, you can check this interactive map, and look for the blue dots.
## this creates a link to the requested data
link = selectDWD("Potsdam", res="daily", var="kl", per="recent")
## this download the corresponding file
file = dataDWD(link, read=FALSE, dir="~/", quiet=TRUE, force=NA, overwrite=TRUE)
## and this reads the content of the file into R
clim = readDWD(file, varnames=TRUE)
Have a look at the downloaded table to see what kind of variables are available:
STATIONS_ID MESS_DATUM QN_3 FX.Windspitze FM.Windgeschwindigkeit QN_4
1 3987 2018-05-27 10 9.2 3.2 3
2 3987 2018-05-28 10 10.0 4.5 3
3 3987 2018-05-29 10 12.5 5.2 3
4 3987 2018-05-30 10 10.8 3.6 3
5 3987 2018-05-31 10 8.1 3.3 3
RSK.Niederschlagshoehe RSKF.Niederschlagsform SDK.Sonnenscheindauer
1 0.4 6 10.250
2 0.0 0 13.433
3 0.0 0 15.383
4 0.0 0 10.317
5 0.0 0 11.733
SHK_TAG.Schneehoehe NM.Bedeckungsgrad VPM.Dampfdruck PM.Luftdruck
1 0 4.3 15.7 1008.94
2 0 4.5 16.0 1007.71
3 0 0.8 13.3 1004.02
4 0 3.3 15.6 1002.91
5 0 3.2 16.6 1003.17
TMK.Lufttemperatur UPM.Relative_Feuchte TXK.Lufttemperatur_Max
1 20.5 67.58 28.4
2 24.3 56.46 32.2
3 25.6 43.17 32.6
4 24.1 53.42 33.2
5 24.4 57.63 31.9
TNK.Lufttemperatur_Min TGK.Lufttemperatur_5cm_min eor
1 13.8 11.4 eor
2 15.8 13.1 eor
3 18.2 14.9 eor
4 18.2 15.1 eor
5 17.1 14.4 eor
b. comparing variables
We have downloaded a recent dataset, containing the data for the last 2 years; we can have a look at the time series for a certain variable, for example TMK.Lufttemperatur
: we will plot the data with time as the x
-axis, and temperature in the y-axis
try to plot the sunshine duration for the same period
We can look for correlations between temperature and sunshine duration
[1] 0.6359194
c. comparing location
Is Freiburg warmer than Potsdam? Let us get the data from these 2 locations and compare the monthly temperatures over the last 2 years:
## Start with Potsdam
## this creates a link to the requested data
link = selectDWD("Potsdam", res="monthly", var="kl", per="recent")
## this download the corresponding file
file = dataDWD(link, read=FALSE, dir="~/", quiet=TRUE, force=NA, overwrite=TRUE)
## and this reads the content of the file into R
clim.po = readDWD(file, varnames=TRUE)
## same for Freiburg
## this creates a link to the requested data
link = selectDWD("Freiburg", res="monthly", var="kl", per="recent")
## this download the corresponding file
file = dataDWD(link, read=FALSE, dir="~/", quiet=TRUE, force=NA, overwrite=TRUE)
## and this reads the content of the file into R
clim.fr = readDWD(file, varnames=TRUE)
More rain? More heat?
##
rain.po = clim.po$MO_RR.Niederschlagshoehe
rain.fr = clim.fr$MO_RR.Niederschlagshoehe
##
##
sun.po = clim.po$MO_SD_S.Sonnenscheindauer
sun.fr = clim.fr$MO_SD_S.Sonnenscheindauer
##
temp.po = clim.po$MO_TT.Lufttemperatur
temp.fr = clim.fr$MO_TT.Lufttemperatur
#
rain = list(Freiburg=rain.fr,
Potsdam=rain.po)
sun = list(Freiburg=sun.fr,
Potsdam=sun.po)
temp = list(Freiburg=temp.fr,
Potsdam=temp.po)
##
##
par(mfrow=c(2,2),mar=c(2,2,2,2))
boxplot(rain,main='Rain')
boxplot(sun,main='Sunshine')
boxplot(temp,main='Temperature')
Are these differences statistically significant? Or could it be simply due to statistical fluctuations in this time period? Since the time points match between the 2 locations, we can perform a paired t-test:
Paired t-test
data: rain.fr and rain.po
t = 2.789, df = 17, p-value = 0.01259
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
7.056499 50.899056
sample estimates:
mean of the differences
28.97778
Paired t-test
data: sun.fr and sun.po
t = -1.0272, df = 18, p-value = 0.318
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-40.70075 13.97128
sample estimates:
mean of the differences
-13.36474
Paired t-test
data: temp.fr and temp.po
t = 1.0981, df = 18, p-value = 0.2866
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2441508 0.7788877
sample estimates:
mean of the differences
0.2673684
For which of these variables is the difference significant?
3. Going back in time…
Climate change happens at a larger time scale than 2 years. Hence, we need to download historical data! We will use monthly
intervals:
## this creates a link to the requested data
link = selectDWD("Potsdam", res="monthly", var="kl", per="historical")
## this download the corresponding file
file = dataDWD(link, read=FALSE, dir="~/", quiet=TRUE, force=NA, overwrite=TRUE)
## and this reads the content of the file into R
clim = readDWD(file, varnames=TRUE)
a. looking at distributions
Let us explore the distribution of sunshine duration in one month of the year (for example April) accross many years: April will be encoded as <year>-04-15
in the column MESS_DATUM
of the clim
table. We need to find the rows of the table for which the column MESS_DATUM
contains the pattern xxx-04-15
:
## the function grep search for a certain string in a vector of strings, and returns the index of the entries which contain this string
rows.april = grep('04-15',clim$MESS_DATUM)
##
year = clim$MESS_DATUM[rows.april]
temp.april = clim$MO_TT.Lufttemperatur[rows.april]
Let us plot the distribution of these values as a histogram:
Compute the mean and standard deviation of these April temperatures:
We can compare this histogram with a “theoretical” normal distribution with same mean and standard deviation:
## generate a vector of x values from 4 to 14 with 0.1 increments
x.norm = seq(4,14,by=.1)
## compute the y value according to a normal distribution
y.norm = dnorm(x.norm,mean=m,sd=s)
Now overlay the histogram with the theoretical distribution:
We can also compare the histogram to a normal distribution using a QQ-plot:
Kind of…
looking at time series
Like we did for the recent data, we can now look at the monthly temperatures for April on a large time range.
Do we see a tendency? Fake news? Try to overlay onto this plot the temperature profile for July
Is there a correlation between time and temperature increase? We can encode time as a numerical vector (1,2,…) and compute a Spearman correlation between this time vector and the temperatures:
[1] 0.3427779
Damn … Das war’s mit Lars!!
Can you determine for which month of the year this correlation is highest?
4. Is the trend significant?
Let us compare the April temperature of the years 1900-1918 and 2000-2018:
# extract the rows corresponding to April
clim.month = clim[grep('04-15',clim$MESS_DATUM),]
##
## now extract the rows corresponding to the early and late time period:
i.early = which(clim.month$MESS_DATUM_BEGINN >= 19000000 & clim.month$MESS_DATUM_ENDE <= 19190000)
i.late = which(clim.month$MESS_DATUM_BEGINN >= 20000000 & clim.month$MESS_DATUM_ENDE <= 20190000)
##
temp.19 = clim.month$MO_TT.Lufttemperatur[i.early]
temp.20 = clim.month$MO_TT.Lufttemperatur[i.late]
Let us visualize the data as a boxplot:
There is obviously a difference in these 2 distributions; but is this difference really statistically significant?
## is there a significant difference between nineteenth and twentieth century?
t.test(temp.19,temp.20)
Welch Two Sample t-test
data: temp.19 and temp.20
t = -4.6079, df = 35.689, p-value = 5.026e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.653731 -1.419953
sample estimates:
mean of x mean of y
7.648421 10.185263
what kind of test was performed? Interpret the output of the test! Try performing a single-sided t-test
Can you see a similar effect for other climate variables (rain, wind,…)?
5. Additional analysis
- redo this analysis for other climate variables, such as rain or sunshine duration. Do you also see a temporal trend?
6. Further data
- Check on Kaggle for related datasets with climate data: here is a list
- Additional R packages and tools here