A Short Note on Stratified Sampling

Table of Contents

Stratified sampling is a Monte Carlo (MC) method to estimate an integral that consist a distribution Sorry, your browser does not support SVG. and a function of interest Sorry, your browser does not support SVG.

Sorry, your browser does not support SVG. 1

via partitioning Sorry, your browser does not support SVG. into smaller groups, called strata. It is a type of quasi-Monte Carlo (QMC) method as it introduces deterministic procedures into the MC estimation. Same as crude Monte Carlo the estimation is still unbiased; however, the variance of the estimator can be smaller than crude MC.

(Crude) Monte Carlo

The (crude) Monte Carlo estimator of Sorry, your browser does not support SVG. is

Sorry, your browser does not support SVG.

where Sorry, your browser does not support SVG. and Sorry, your browser does not support SVG. is the number of MC samples.

This estimator is an unbiased1 estimator of the original target Sorry, your browser does not support SVG.:

Sorry, your browser does not support SVG.

Now let’s look at the variance of the estimator Sorry, your browser does not support SVG.. We first need to define Sorry, your browser does not support SVG., which is the variance of Sorry, your browser does not support SVG. under the distribution Sorry, your browser does not support SVG.. Now the variance of interest is

Sorry, your browser does not support SVG.

As it can be seen, the variance drops with the order of Sorry, your browser does not support SVG. by using more samples.

Example

I now use an area estimation example2 to demonstrate how the empirical estimate of the variance of the crude Monte Carlo estimator changes as the number of samples increases. The area Sorry, your browser does not support SVG. I am interested in estimating is illustrated in the figure below.[area]

S.png

Figure 1: The area Sorry, your browser does not support SVG.

This is equivalent to computing the integral below

Sorry, your browser does not support SVG.

where Sorry, your browser does not support SVG. is the indicator function and Sorry, your browser does not support SVG. is the uniform distribution over Sorry, your browser does not support SVG..

The crude MC estimator for Sorry, your browser does not support SVG. is then

Sorry, your browser does not support SVG.

where Sorry, your browser does not support SVG. and Sorry, your browser does not support SVG. is the MC samples.

I now visualise how the variance of Sorry, your browser does not support SVG. changes as Sorry, your browser does not support SVG. increases.

crude_mc.png

Figure 2: Variance convergence of crude MC

Note that both the x- and y-axis are in logarithm scale. As we can see, it matches our analysis results, which is Sorry, your browser does not support SVG..3

Stratified sampling

Now, we look at how stratified sampling works and how the variance of the corresponding estimator changes depending on different stratifications. For the sake of simplicity, we consider the case where Sorry, your browser does not support SVG. a Sorry, your browser does not support SVG.-dimensional uniform distribution for the integral in equation 1.4 We first partition Sorry, your browser does not support SVG. into Sorry, your browser does not support SVG. strata s.t. Sorry, your browser does not support SVG.. Then, as the simplest way of performing stratified sampling, we uniformly draw independent Monte Carlo samples for each stratum Sorry, your browser does not support SVG., and the overall estimator is given as below

Sorry, your browser does not support SVG.

where Sorry, your browser does not support SVG. are samples that lie in Sorry, your browser does not support SVG. and Sorry, your browser does not support SVG. is the number of MC samples in stratum Sorry, your browser does not support SVG..

The mean of Sorry, your browser does not support SVG. within each stratum Sorry, your browser does not support SVG. is

Sorry, your browser does not support SVG.

and the corresponding variance is

Sorry, your browser does not support SVG.

As it is mentioned in the beginning, Sorry, your browser does not support SVG. is still an unbiased estimator of Sorry, your browser does not support SVG.. We can validate this by computing the mean of Sorry, your browser does not support SVG.

Sorry, your browser does not support SVG.

The variance of Sorry, your browser does not support SVG. is then

Sorry, your browser does not support SVG.

Effect of different allocation strategies

We are interested in comparing this estimator against (crude) Monte Carlo. In particular, we’d like to show that when Sorry, your browser does not support SVG. is allocated proportionally to Sorry, your browser does not support SVG. the estimator can only decrease in variance as compared to (crude) Monte Carlo.

In order to show this, we need to rewrite the variance of crude Monte Carlo in terms of strata. Let’s introduce Sorry, your browser does not support SVG. - the stratum that contains Sorry, your browser does not support SVG.: Sorry, your browser does not support SVG.. By the law of total variance, we have

Sorry, your browser does not support SVG.

Thus Sorry, your browser does not support SVG.

Note that in the case of proportional allocation, because Sorry, your browser does not support SVG.,

Sorry, your browser does not support SVG.

which is smaller than Sorry, your browser does not support SVG.. Although it is not the optimal allocation5, proportional allocation as applied to stratified sampling will never increase the variance over that of crude Monte Carlo. Also, note that a poor allocation can also increase the variance of the estimator in comparison to crude Monte Carlo.

Effect of different stratification methods

It is interesting to note that, if Sorry, your browser does not support SVG. is in 2D and is dominated by the horizontal direction more than the vertical direction, partitioning in the horizontal direction would be more effective than partitioning in the vertical direction. We will see it in the example soon.

Example (continued)

We continue with the example above by looking at 2 ways of stratifying the previously defined area.

  1. Vertically partitioning the space into 10 parts
  2. Horizontally partitioning the space into 10 parts

We illustrate where the samples are located for each method and for crude Monte Carlo in the figure below.

sampling_comparisons.png

Figure 3: Different ways of sampling: crude MC (left), stratified sampling with vertical partitioning (middle) and stratified sampling with horizontal paritioning (right)

Notice that there are 100 samples in total for each method and that, for the stratified sampling methods, there are exactly 10 samples in each box.

Now what is interesting is the rate of variance of each estimator as the number of samples increase (shown below).

crude_mc-vs-quasi_mc.png

Figure 4: Variance convergence: crude MC v.s. quasi MC with different paritioning

As it can be seen, both QMC methods converge faster than crude Monte Carlo, as expected by our analysis. What’s more interesting is that horizontally partitioning the space is more effective than vertically partitioning it. This is because the pre-defined shape is dominated by the y-axis and not the x-axis. This domination can be explained by the shape’s symmetry along the y-axis and not the x-axis which means that the y-axis contains more information about the shape.


1

The estimator Sorry, your browser does not support SVG. is an unbiased estimator of Sorry, your browser does not support SVG. if Sorry, your browser does not support SVG..

2

The exact area is Sorry, your browser does not support SVG..

3

The plot of Sorry, your browser does not support SVG. in a log-log plot is a straight line.

4

Stratification doesn’t lose generality as any Sorry, your browser does not support SVG. of interest can be reformulated to take uniform samples.

5

The optimal allocation is taking Sorry, your browser does not support SVG.

Author: Kai Xu

Created: 2021-06-04 Fri 16:13