I am currently doing data analysis of microarrays. There are 20 arrays, devided into 5 animals and 4 treatments. It is a repeated meassurements experiments.
I have done PCA to see if the treatments can explain the variance and saw one array quite far from the rest of arrays. This happens when using the whole data set (PCA1 = -200, PCA2 = 50) and then the data set having only the differentially expressed genes (PCA1 =,-100, PCA2 = -30) .
A more graduated approach might be to use arrayWeights, which should assign a lower weight to any outlier array with variable signal relative to its replicates. This reduces its influence on the linear modelling, DE testing, etc. without requiring the drastic action of tossing out the array altogether. I prefer not to remove arrays if possible, as that means I'm throwing out data and reducing residual d.f. to estimate the variance/power to detect DE (as you might have witnessed yourself, from the reduction in DE genes when the affected animal is removed). It's also hard to draw the line between what is an outlier and what isn't when you have small numbers of samples.
I don't think it makes sense to interpret the weights as scaling factors. Rather, they modify the expression for the sum-of-squares to be minimised, when solving the linear system in lmFit. As such, you'll need to figure out if your downstream processes have an analogous objective function, in order to get a consistent interpretation for the weights. I've heard of methods for weighted PCA, but I haven't used them so I can't vouch for how sensible they are.
Dear Aaron,
thank you very much (again).
I have applied the
arrayWeightsSimple
as in the example?arrayWeights
The amount of DEGs now increases. I guess this makes sense, since the variance is deflated when the outlier array is weighted.
However, this works within the linear model and I would like to visualize the weighted arrays in a new PCA or dendrogram.
Is it correct if I do the following?
Thanks!
I don't think it makes sense to interpret the weights as scaling factors. Rather, they modify the expression for the sum-of-squares to be minimised, when solving the linear system in
lmFit
. As such, you'll need to figure out if your downstream processes have an analogous objective function, in order to get a consistent interpretation for the weights. I've heard of methods for weighted PCA, but I haven't used them so I can't vouch for how sensible they are.