This experiment involves 6 subjects, including 3 animals who dead after infection by a virus and 3 animal who survive.
From each subject, blood were collacted before any treatment (D0) and at different time points after infection (as animal are not dying the same day, we have missing time point for some animals). There is a single time point (Day 3) for which we have data in dead and alive animals. So, I would like to identify genes that respond differently at Day3 in Alive relative to the Dead animals.
The correlation is very low (0.18). Should I performed the analysis without duplicateCorrelation ?
Your current approach seems sensible to me. A low correlation is not a problem - it just means that your batch effect is weak, which is a good thing. Indeed, as the correlation gets smaller and smaller, your results with duplicateCorrelation should converge to what you would get if you didn't use duplicateCorrelation at all. For example:
set.seed(12345)
a <- matrix(rnorm(100000), ncol=10)
group <- gl(2, 5)
batch <- gl(5, 2)
library(limma)
design <- model.matrix(~group)
fit1 <- lmFit(a, design)
fit1 <- eBayes(fit1)
topTable(fit1)
fit2 <- lmFit(a, design, correlation=0, block=batch)
fit2 <- eBayes(fit2)
topTable(fit2) # should be the same as above.
So you can see that having "too low" a correlation poses no danger to the analysis with duplicateCorrelation, because it will automatically converge to the analysis without. The only real disadvantage is that it'll take longer to run, but with 6 samples this is not really an issue. Besides, it is quite difficult to tell whether a correlation of 0.18 is big or small in terms of the ultimate effect of the p-value; so you might as well take it into account if you can.
Thank you so much for your explanation. Now it's much more clear.
In my case, using duplicateCorrelation() blocking on each animal calculate the correlation measurements made on the same animal, this is why we do not expect a high value (due to experimental condition) is it ?
I don't understand your last question. From the description of your experimental design, I have no prior expectations whatsoever about the size of the correlation. If you have high animal-to-animal variability but a consistent response to time in each animal, then the consensus correlation will be high, as most of the variance in the data will be driven by the animal effect. Otherwise, the consensus correlation will be low; it just depends on how consistently your animals behave.
0.18 is not at all a low correlation in this context, in fact it is the sort of correlation one might expect for this sort of analysis. It will be have an effect on the p-values, so it isn't ignorable.
A low correlation would be something less than 0.01. As Aaron points out, it would still be fine to use duplicateCorrelation unless the correlation was actually negative.
Very high correlations (> 0.5) are not usual in this context. In such a case, I would consider using a blocked analysis instead.
Thank you so much for your explanation. Now it's much more clear.
In my case, using duplicateCorrelation() blocking on each animal calculate the correlation measurements made on the same animal, this is why we do not expect a high value (due to experimental condition) is it ?
I don't understand your last question. From the description of your experimental design, I have no prior expectations whatsoever about the size of the correlation. If you have high animal-to-animal variability but a consistent response to time in each animal, then the consensus correlation will be high, as most of the variance in the data will be driven by the animal effect. Otherwise, the consensus correlation will be low; it just depends on how consistently your animals behave.