In the coupon collector problem, you collect coupons one day at a time, each time getting a coupon of a random type, from among \(n\) distinc types. We’re interested in the number of days it takes until we’ve collected all \(n\) types. Since on each day, we might get a coupon type we’ve received before, it might take us longer than \(n\) days to collect all the types. See the coupon collector worksheet for details. To simulate the process of collecting coupons until we’ve collected all the types, we’ll follow this general structure:

- Each day we’ll sample a coupon of random type, each type equally likely (using the
`sample`

function). - Then we’ll check whether we’ve collected this before (using an
`if`

statement). - If we haven’t collected that type before, we’ll make note of its type, to keep a record of collected types, and increase our count of how many types have been collected.
- We’ll repeat the above steps until all types have been collected (using a
`while`

loop).

We’ll code the above summary into a function called `coupon`

. It takes as input the number \(n\) of coupon types, and gives as output the number of days it took to collect all types.

```
coupon <- function(n) {
# Input: n = number of coupon types
# Output: number of days until all n types have been collected
# Summary: this function simulates the coupon collector problem. Each time we run the coupon function,
# we'll get the random number of days it took to collect all n coupon types
count <- 0 # how many coupons have been collected
days <- 0 # how many days have passed
collectedCouponTypes <- numeric(n) # a vector of 0's and 1's; i'th entry is 1 if coupon type i has been collected
while(count < n) {
# the code below will run until count = n, when we've collected all the coupon types
todaysCouponType <- sample(1:n, size = 1)
if (collectedCouponTypes[todaysCouponType] == 0) { # means we havent collected today's coupon type yet
collectedCouponTypes[todaysCouponType] <- 1
count <- count + 1
}
days <- days + 1
}
return(days)
}
```

The coupon collector process is random, so each time we run the `coupon`

function, we should expected a different output. Here is one run with 4 coupon types.

`coupon(4)`

`## [1] 6`

Here is another.

`coupon(4)`

`## [1] 21`

One more.

`coupon(4)`

`## [1] 5`

Of course, a big reason to do simulation is to check the whether our analytical work is correct. In class, we let \(X\) denote the number of days it took to collect all the coupon types. We found that \[E[X] = n\left(1 + \frac 12 + \cdots + \frac 1n\right).\] Let’s see if we can check this using the Strong Law of Large Numbers. The code below will

- Simulate many, say 1 million, trials of the coupon collector process by running
`coupon`

1 million times and recording the result. (This means generating \(X_1,X_2,\ldots,X_{1,000,000}\) i.i.d. samples of \(X\).) - Take an arithmetic mean of our results from the million trials. (This means computing \(\frac{X_1 + \cdots + X_{1,000,000}}{n}\) and using it to approximate \(E[X]\).)

```
n <- 4
simlist <- replicate(1e6, coupon(n))
mean(simlist)
```

`## [1] 8.33022`

The exact value of \(E[X]\) from the theory is

`print(n*sum(1/(1:n)))`

`## [1] 8.333333`