Models are wrong, .... but some are useful (G. Box): The propagation of measurement errors

Measures are usually divided into two groups: direct and derived. In this latter case we have two separate direct measurements (Q and W) and need to calculate a linear combination Z = AQ + BW where A and B are two generic coefficients. We already know that Q and W are affected by random measurement errors (and thus they are random variables) and we may wonder how such errors propagate to Z.
We'll start by answering a simpler question, i.e. “I have the estimate Q with variance q, what is the variance (var) of A + B Q?” (A and B are two generic coefficients).
The answer is very simple:
\[ var(A + BQ) = A^2 var(Q) = A^2 q \]
and it is very easy to prove. I'll give you just an example: let's consider three values (e.g. 12, 15, 19) with mean equal to 15.33 and variance equal to 12.33. If we transform each of the three values by multiplying it by 3 and adding 5, we can easily verify that the variance of the transformed sample is 3² x 12.3333 = 111.
And what if we have two estimates Q and W, with variances respectively equal to q and w, and we want to estimate the variance of A Q + B W?
In this case we need to know whether the two quantities Q and W are independent or, on the contrary, what their covariance (COV) is. It can be easily proved that:
\[ var(AQ + BW) = {A^2}var(Q) + {B^2}var(W) + 2AB cov(XY) \]
A formal proof can be found in every statistics book, e.g. in Sokal and Rohlf (1981) at pag. 818. We'll show this by an R example:

Q <- c(12, 14, 11, 9)
W <- c(2, 4, 7, 8)
a <- 2
b <- 3
var(a * Q + b * W)

## [1] 35.58

a^2 * var(Q) + b^2 * var(W) + 2 * a * b * cov(Q, W)

## [1] 35.58

This implies that for two uncorrelated variables the variance of the sum equals to the sum of the variances.
Furthermore, if we remember the following equality:
\[r = \frac{cov(Q,W)}{\sqrt {var(Q)var(W)} }\]
where r is the correlation coefficient, then:
\[cov(Q,W) = r sd(Q)sd(W)\]
where sd is the standard deviation. We can modify the equation above accordingly, if we would like to use r, instead of the covariance.
And what if we have more than three quantities to combine?
The above equation can be generalised as (in words): the variance of the sum of n random variables (X1, X2, X3, … Xn) with n coefficient (B1, B2, B3, …, Bn) is equal to the product of the square of each coefficient by the variance of the corresponding variable, plus twice the product of each couple of coefficient by the covariance of the two corresponding variables.
This may be easily posed in matrix notation as:
var[A X] = ASA^T
where X is the vector of random variables to be combined, A is the vector of coefficients. For example, if we want to assess the variance of the combination 3 + 6 + 4, where all coefficients are equal to one, variances are all equal to 0.5 and covariances are all equal to 0.75, we can do this by the following code:

X <- matrix(c(3, 6, 4), 3, 1)
A <- matrix(c(1, 1, 1), 1, 3)
A %*% X

##      [,1]
## [1,]   13

(Sigma <- matrix(c(0.5, 0.75, 0.75, 0.75, 0.5, 0.75, 0.75, 0.75, 0.5), 3, 3, 
    byrow = T))

##      [,1] [,2] [,3]
## [1,] 0.50 0.75 0.75
## [2,] 0.75 0.50 0.75
## [3,] 0.75 0.75 0.50

A %*% Sigma %*% t(A)

##      [,1]
## [1,]    6

Two final notes relating to linear transformations are very important:

The estimated variance is always exact.
If the original variables are gaussian, also the transformed variable is gaussian.

Models are wrong, .... but some are useful (G. Box)

Saturday 8 February 2014

The propagation of measurement errors

No comments:

Post a Comment