Overview

In the coupon collector problem, you collect coupons one day at a time, each time getting a coupon of a random type, from among \(n\) distinc types. We’re interested in the number of days it takes until we’ve collected all \(n\) types. Since on each day, we might get a coupon type we’ve received before, it might take us longer than \(n\) days to collect all the types. See the coupon collector worksheet for details. To simulate the process of collecting coupons until we’ve collected all the types, we’ll follow this general structure:

  1. Each day we’ll sample a coupon of random type, each type equally likely (using the sample function).
  2. Then we’ll check whether we’ve collected this before (using an if statement).
  3. If we haven’t collected that type before, we’ll make note of its type, to keep a record of collected types, and increase our count of how many types have been collected.
  4. We’ll repeat the above steps until all types have been collected (using a while loop).

The code

We’ll code the above summary into a function called coupon. It takes as input the number \(n\) of coupon types, and gives as output the number of days it took to collect all types.

coupon <- function(n) {
  # Input: n = number of coupon types
  # Output: number of days until all n types have been collected
  # Summary: this function simulates the coupon collector problem.  Each time we run the coupon function,
  #           we'll get the random number of days it took to collect all n coupon types
  
  count <- 0 # how many coupons have been collected
  days <- 0 # how many days have passed
  collectedCouponTypes <- numeric(n) # a vector of 0's and 1's; i'th entry is 1 if coupon type i has been collected
  
  while(count < n) { 
    # the code below will run until count = n, when we've collected all the coupon types
    
    todaysCouponType <- sample(1:n, size = 1)
    if (collectedCouponTypes[todaysCouponType] == 0) { # means we havent collected today's coupon type yet
      collectedCouponTypes[todaysCouponType] <- 1
      count <- count + 1
    }
    days <- days + 1
  }
  
  return(days)
}

Trying our function

The coupon collector process is random, so each time we run the coupon function, we should expected a different output. Here is one run with 4 coupon types.

coupon(4)
## [1] 6

Here is another.

coupon(4)
## [1] 21

One more.

coupon(4)
## [1] 5

Checking the theory

Of course, a big reason to do simulation is to check the whether our analytical work is correct. In class, we let \(X\) denote the number of days it took to collect all the coupon types. We found that \[E[X] = n\left(1 + \frac 12 + \cdots + \frac 1n\right).\] Let’s see if we can check this using the Strong Law of Large Numbers. The code below will

  1. Simulate many, say 1 million, trials of the coupon collector process by running coupon 1 million times and recording the result. (This means generating \(X_1,X_2,\ldots,X_{1,000,000}\) i.i.d. samples of \(X\).)
  2. Take an arithmetic mean of our results from the million trials. (This means computing \(\frac{X_1 + \cdots + X_{1,000,000}}{n}\) and using it to approximate \(E[X]\).)
n <- 4
simlist <- replicate(1e6, coupon(n))
mean(simlist)
## [1] 8.33022

The exact value of \(E[X]\) from the theory is

print(n*sum(1/(1:n)))
## [1] 8.333333