The following introduces the following three ideas through an example:

  1. the probability mass function of a random variable \(X\)
  2. the expectation \(E[X]\) of the random variable
  3. how to approximate \(E[X]\) through simulation

Probability mass function

In the game of Scrabble, there are 100 tiles, each inscribed with a letter from the alphabet and a point value. Let’s focus on the point values only. Among the 100 tiles, the point values can be repeated and the table below gives the frequencies for each point value.

point values frequencies
0 2
1 68
2 7
3 8
4 10
5 1
8 2
10 2

The following R code makes a vector of point values, and a vector of probabilities (based on the frequencies) for each point value.

pointValues <- c(0,1,2,3,4,5,8,10)
frequencies <- c(2,68,7,8,10,1,2,2)/100 # We divide by 100, the total number of tiles
names(frequencies) <- pointValues
##    0    1    2    3    4    5    8   10 
## 0.02 0.68 0.07 0.08 0.10 0.01 0.02 0.02

The first row gives the possible point values on the tiles, and the second row gives their respective relative frequencies.

Let \(X\) be the point value of a randomly selected Scrabble tile. The probability mass function (PMF) of \(X\) is the function \(m(k)\) given by \[m(k) = P(X = k), \quad \mbox{for $k = 0,1,2,3,4,5,8,10$.}\] The table is thus a way of expressing the PMF of \(X\). We can also visualize the PMF using a histogram:

barplot(height = frequencies, names = pointValues, xlab = "point values", ylab = "probability", main = "Histogram of point values")

Note: the PMF of a random variable isn’t really a new idea; we’ve also used the alternative terms probability function of \(X\) or distribution of \(X\) for this function.


The expectation of \(X\) is defined as \[E[X] = \sum_k kP(X=k).\] It’s a weighted average of the possible values of \(X\), with weights given by probabilities. If we think of \(P(X=k)\) as a measure of mass (as the name PMF suggests), then \(E[X]\) is simply an expression of the center of mass formula from physics.

Using the center of mass analogy, we can look at the histogram of \(X\) in our Scrabble example, and guess that \(E[X]\) is close to 1 or 2 since there’s so much mass over 1. Here’s a direct computation:

expectation <- sum(pointValues*frequencies)
## [1] 1.87


The Strong Law of Large Numbers (SLLN) is a theorem which says that if \(X_1, X_2, \ldots\) is an i.i.d. sequence of random variables with the same distribution as a random variable \(X\), with \(E[X] < \infty\), then \[\lim_{n \to \infty} \frac{X_1 + \cdots + X_n}{n} = E[X] \quad \mbox{with probability 1.}\] This theorem allows us to approximate the expected value of a random variable \(X\) as follows:

  1. Take \(n\) independent samples of \(X\) (These are the \(X_1,\ldots, X_n\) in the SLLN.)
  2. Take the arithmetic mean of these samples. (This is approximately \(E[X]\) when \(n\) is large.)

The following code does this approximation of \(E[X]\) for our Scrabble example:

n <- 1e5
simlist <- sample(pointValues, size = n, prob = frequencies, replace = TRUE)
## [1] 1.87596

Notice the prob argument in the sample command lets us specify the probability of choosing each value in the set from which we sample.