Multivariate Statistic Process Control method comparison Part 2: Generating control limits and initial look

5 min readMay 26, 2021

Earlier this year, I posted the first part of a write-up for a project I worked on last year.

Multivariate Statistic Process Control Part 1: Simulating correlated data and optimizing data generation.

That project involved taking 4 different types of correlation, test the baseline plus 8 different data shifts, and then for each of these combinations generating 1000 simulations with 1000 points each. I then compared how three different multivariate control charts, Hotelling T², MEWMA, and MCUSUM, were in detecting a shift while holding the false rate fixed over all simulations.

Today, I use the fairly standard in-control average run length (ARL) of 300. This means, on average, when the process is in control, you can expect to see an out of control data point in any 300 sequential points. The simple way to do this is to calculate the control limits for each control chart using the in-control data so that this constraint is kept. For this code, the variable cARL is set to 300 to represent constraining the ARL to be 300.

t2ucl <- simData %>%
  filter(shift == "0/0") %>%
  group_by(copula, rho) %>%
  summarize(UCLt2 = quantile(t2, probs = (cARL - 1)/cARL)) %>%
  select(copula, rho, UCLt2) %>%
  spread(rho, UCLt2)%>%
  column_to_rownames(var="copula")mewmaucl <- simData %>%
  filter(shift == "0/0") %>%
  group_by(copula, rho) %>%
  summarize(UCLme = quantile(mewma, probs = (cARL - 1)/cARL)) %>%
  select(copula, rho, UCLme) %>%
  spread(rho, UCLme)%>%
  column_to_rownames(var="copula")mcusumucl <- simData %>%
  filter(shift == "0/0") %>%
  group_by(copula, rho) %>%
  summarize(UCLmc = quantile(mcusum, probs = (cARL - 1)/cARL)) %>%
  select(copula, rho, UCLmc) %>%
  spread(rho, UCLmc)%>%
  column_to_rownames(var="copula")

From these control limits, we can test the out of control data, and determine the ARL, where the lower the value, the quicker a shift is detected. For ease of use, if a run has zero points out of control, I will treat that as an ARL equal to the size of the run, in each of these cases that being 1000.

tmpData <- list()
index <- 1
for(i in 1:Ncopula) {
  for(j in 1:Nrho) {
    tmpData[[index]] <- splitData[[index]] %>%
      group_by(copula, rho, shift, iteration) %>%
      summarize(t2ARL = ifelse(shift == "0/0", size/sum(t2 > t2ucl[i,j]),
                              detect_index(t2, function(z)(z>t2ucl[i,j]))),
                meARL = ifelse(shift == "0/0", size/sum(mewma > mewmaucl[i,j]),
                              detect_index(mewma, function(z)(z>mewmaucl[i,j]))),
                mcARL = ifelse(shift == "0/0", size/sum(mcusum > mcusumucl[i,j]),
                              detect_index(mcusum, function(z)(z>mcusumucl[i,j])))) %>%
      mutate(t2ARL = ifelse(is.infinite(t2ARL), size, t2ARL),
             meARL = ifelse(is.infinite(meARL), size, meARL),
             mcARL = ifelse(is.infinite(mcARL), size, mcARL))
    index <- index + 1
  }
}
rm(splitData)

All that remains is to get those ARL values in a way that is easy to visualize, as well as generate confidence intervals for the simulation as a whole.

ARL <- bind_rows(tmpData) %>%
  group_by(copula, rho, shift) %>%
  summarize(t2ci = list(mean_cl_normal(t2ARL) %>%
                          rename(t2ARLmean=y, t2ARLlwr=ymin, t2ARLupr=ymax)),
            meci = list(mean_cl_normal(meARL) %>%
                          rename(meARLmean=y, meARLlwr=ymin, meARLupr=ymax)),
            mcci = list(mean_cl_normal(mcARL) %>%
                          rename(mcARLmean=y, mcARLlwr=ymin, mcARLupr=ymax))) %>%
  unnest(cols = c(t2ci, meci, mcci))

Now that we have our control chart ARL metrics, let’s check that the in-control ARL is approximately 300. This corresponds when the column shift is 0/0, meaning the mean for both variables is 0. I'm selecting columns to exclude the upper and lower values for the confidence intervals.

ARL %>%
 filter(shift == "0/0") %>%
 select(c(1:4,7,10))# A tibble: 16 x 6
# Groups: copula, rho [16]
copula rho shift t2ARLmean meARLmean mcARLmean <fct> <fct> <fct> <dbl> <dbl> <dbl>
1 Normal 0.6 0/0 300. 299. 299.
2 Normal 0.2 0/0 300. 300. 299.
3 Normal -0.2 0/0 300. 300. 299.
4 Normal -0.6 0/0 300. 299. 301.
5 Frank 0.6 0/0 300. 299. 300.
6 Frank 0.2 0/0 300. 299. 300.
7 Frank -0.2 0/0 300. 300. 298.
8 Frank -0.6 0/0 299. 299. 301.
9 Clayton 0.6 0/0 300. 300. 300.
10 Clayton 0.2 0/0 300. 299. 299.
11 Clayton -0.2 0/0 299. 300. 299.
12 Clayton -0.6 0/0 299. 300. 299.
13 Gumbel 0.6 0/0 300. 301. 299.
14 Gumbel 0.2 0/0 300. 299. 300.
15 Gumbel -0.2 0/0 299. 299. 301.
16 Gumbel -0.6 0/0 299. 300. 299.

Things are looking good, with everything being right around 300. We’ve got a lot of plots we can make to visualize this, but for illustrating this simply, let’s first look at the simplest, the Hotelling T² control chart, looking at a sample control chart as well as the ARL values for both in control and out of control for the different correlation.

Let’s take a look at the raw data and Hotelling T² control chart for two of the runs of the Normal copula with a correlation of 0.6, one for when there is no shift, and one where just one of the two variables has shifted by 3.

We see in this case with no shift, that only 3 points fall above the upper control limit (UCL), which for 1000 points, that is what we would roughly expect to encounter.

When simulating one of the variables being out of control with a shift of 3, we get a lot of points above the UCL and registered as out of control. This is exactly what we would like to happen, as in a production environment, we would want to detect this change and take action to correct it.

Finally, let’s look at an overall summary of the performance for Hotelling T² charts for these different copulas and correlations.

The curious thing we see with these four graphs; is how well the T² control chart performs for detecting different shifts, varies between copula type and the strength of the correlation. For my next and final post on this project, I want to delve into the different types of correlations we’re working with here and how they are structured and how that changes the performance of the Hotelling T² control chart.

Originally published at https://heril.github.io/

Multivariate Statistic Process Control method comparison Part 2: Generating control limits and initial look

Written by Spencer R Hall