--- title: "Expectation Maximization" author: "Carl Herrmann" date: "24/01/2020" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` We present a simulation to explain the principles of Expectation Maximization (EM). We assume that 2 coins are used, which are biased in a different way, and can yield 1 or 2 as output. We observe the outcome of a certain number of draws (n series of k throws). We define the way in which these coins are biased, and check of the EM algorithm is able to find these hidden parameters: ```{r} set.seed(123) bias = c('blue'=0.75,'red'=0.22) ``` For each color, the bias indicates the probability to get a 1. We now define the parameters of the game: ```{r} K = 10 # number of throws per series N = 5 # number of series ## this function throws the dice dice = function(bias) { ifelse(runif(1)>= bias, 2, 1) } ## this function plays the game sequences = lapply(1:N,function(i) { color = ifelse(runif(1)>0.7,'red','blue') sapply(1:K,function(j) { dice(as.numeric(bias[color])) }) }) head(sequences) ``` We initialize the parameters of the EM: ```{r} ## random starting point for the parameters t = c(0.8,0.7) ``` This is the total number of 1 and 2 in each series: ```{r} n = do.call('rbind',lapply(sequences,function(x) { c(sum(x==1),sum(x==2)) })) head(n) ``` We now run the EM algorithm over 30 iterations: ```{r} T=matrix(t,ncol=2) for (iteration in 1:30) { p=do.call('rbind',lapply(1:nrow(n),function(i) { x=n[i,] y = c(t[1]^x[1]*(1-t[1])^x[2], t[2]^x[1]*(1-t[2])^x[2]) y/sum(y) })) X = cbind(p[,1]*n,p[,2]*n) XX = apply(X,2,sum) t=c(XX[1]/sum(XX[1:2]),XX[3]/sum(XX[3:4])) T=rbind(T,t) } ``` T contains the estimate for the parameters, i.e. probability of 1 for blue or red: ```{r} head(T) ``` We can plot the estimate of the parameters over different iterations: ```{r} plot(1:nrow(T),T[,1],ylim=c(0,1),type='b',col='blue',xlab='Iterations',ylab='Parameters',lwd=2);points(1:nrow(T),T[,2],type='b',col='red',lwd=2);abline(h=bias,lty=3);title(paste0('N = ',N,'; K = ',K)) ```