The Birthday Simulation

Nothing novel or unique about this problem.  This just extends the problem to measure the probability to three or more people sharing the same birthday using simulation approaches. Though there are other ways to approach with problem with built-in functions the example below show some of the individual steps.

For two people it’s fairly straight forward and with a group of about 22 people the probability that two people share the same birthday is about 0.5.  For groups approaching 50 there is an extremely high probability that two people share the same birthday



When determining that three (or more) people have the same birthday the probability decreases fairly quickly compared to measuring only two people.  A fairly large group would be needed to find three people with the same birthday.

Birthday Plot


Here is some R code to determine these probabilities.

n.rep = 5000
theta.val = 75
doy = seq(from=1, to=365, by=1)
sim.mat = matrix(NA, nrow=theta.val, ncol=4)

getProb = function(n){
q = 1 - seq(0,n-1)/365
p = 1 - prod(q)

theta.list = seq(from=2, to=75, by=1)
p.graph = sapply(theta.list, getProb)
fifty.fifty = which(p.graph >.5)[1]
plot(p.graph, main="Probability Two People Have the Same Birthday", ylab='Probability', xlab="Number of People in Group")
abline(h=.5, v=fifty.fifty)


## For matching multiple people

## Runs a little slow.
for(i in 2:theta.val){
bday = replicate(n.rep, sample(doy, size=i, replace=T) )
bday.table = apply(bday, 2, table)

sim.2 = ifelse( unlist( lapply(bday.table, max) ) >=2, 1, 0)
sim.3 = ifelse( unlist( lapply(bday.table, max) ) >=3, 1, 0)
sim.4 = ifelse( unlist( lapply(bday.table, max) ) >=4, 1, 0)

sim.mat[i,1] = i
sim.mat[i,2] = sum(sim.2)/length(sim.2)
sim.mat[i,3] = sum(sim.3)/length(sim.3)
sim.mat[i,4] = sum(sim.4)/length(sim.4)


graph.sim = t( sim.mat[,2:4] )
colnames(graph.sim) = sim.mat[,1]

barplot(graph.sim[1,], ylim=c(0,1), col="red",
main="Probability of Having Multiple People with the Same Birthday",
xlab="People with Birthday",
barplot(graph.sim[2,], ylim=c(0,1), col="blue", add=T)
barplot(graph.sim[3,], ylim=c(0,1), col="black", add=T)
legend("topleft", c("2","3","4"), col=c("red","blue","black"), lwd=3)

Leave a comment


  1. Ken Knoblauch

     /  May 22, 2014

    or try looking at ?pbirthday… Might speed things up.

    • Wesley

      Look at that. There’s a function for the theoretical probability. Good to know.


Leave a Reply

Your email address will not be published. Required fields are marked *

three + = 7

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

%d bloggers like this: