# Confidence Intervals, simulated vs theoretical

Assume we have two candicates, T and D, who are the two candidates running for election in a small village. You poll 1099 inhabitants, and of them 544 say the will vote for D, and 555 say they will vote for T.

In a table, it’ll look like this:

table(voter)
D T
544 555

We’re fairly sure about the proportions of these votes, as the following:

D_prop <- 544/1099
D_prop
##  0.4949955
T_prop <- 555/1099
T_prop
##  0.5050045

Ie, around 49.5 % for D, and 50.5 % for T.

## How sure are we about these results?

Confidence Intervals (CI:s) to the rescue!

First, lets look at simulated CI:s

D_1000 <- voter %>%
specify(response = value, success = "D") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95) %>%
mutate(party = "D",simulated = "Yes", reps = 1000)

T_1000 <- voter %>%
specify(response = value, success = "T") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95) %>%
mutate(party = "T", simulated = "Yes", reps = 1000)

ci_voters <- bind_rows(T_1000, D_1000)
ci_voters
lower_ci upper_ci party simulated reps
0.4749773 0.5341219 T Yes 1000
0.4649682 0.5232257 D Yes 1000

Ok, so if we simulate this result 1000 times using a bootstraping method, and a CI for 95 %, we’re 95 % sure that the true population mean is between 47.6 % to 53.4 % for T, with a mean of 50.5 %, and between 46.2 % and 52.2 % for D, with a mean of 49.5 %.

How does this compare with theoretical results? Lets use the following formula for CI:s:

$\hat{p} \pm z \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$

meaning that for T, $$\hat{p}$$ = ca 0.505, z = ca 1.96 for a 95 % CI, and n = 1099.

p_hat <- voter %>%
specify(response = value, success = "T") %>%
calculate(stat = "prop") %>% pull(stat)

n <- 1099
z<- qnorm(0.975) #since BOTH tails are counted, and 2.5 % are at both end, this is the Z-score.

z*sqrt((p_hat*(1-p_hat))/n)
##  0.02955953
T_theory <- tibble(lower_ci = p_hat-z*sqrt((p_hat*(1-p_hat))/n),
upper_ci = p_hat+z*sqrt((p_hat*(1-p_hat))/n)) %>%
mutate(party = "T",
simulated = "No",
reps = 0)

kable(bind_rows(T_1000, T_theory),digits = 3)
lower_ci upper_ci party simulated reps
0.475 0.534 T Yes 1000
0.475 0.535 T No 0


r
p_hat <- voter %>%
specify(response = value, success = "D") %>%
calculate(stat = "prop") %>% pull(stat)

n <- 1099
z<- 1.960

z*sqrt((p_hat*(1-p_hat))/n)
##  0.02956007
D_theory <- tibble(lower_ci = p_hat-z*sqrt((p_hat*(1-p_hat))/n),
upper_ci = p_hat+z*sqrt((p_hat*(1-p_hat))/n)) %>%
mutate(party = "D",
simulated = "No",
reps = 0)

kable(bind_rows(D_1000, D_theory), digits = 3)`
lower_ci upper_ci party simulated reps
0.465 0.523 D Yes 1000
0.465 0.525 D No 0

As we can see, the CI:s are close for the simulated ones compared to the theoretical ones, but the code for the simulated ones are easier to follow and write. ##### Leo Carlsson

My research interests include science, statistics and politics.