I have a data set where samples are grouped according to their health status (3 levels) and sampling timepoints (2 levels for each sample). This would be a simplified version of the data set:
df <- data.frame(status = c( rep("healthy", 4), rep("condition1", 4), rep("condition2", 4) ), time = rep(c("1", "2"), 6), animal = paste0("animal", unlist(lapply(1:6, function(x) rep(x,2)))))
I am interested in detecting differentially expressed genes at each time point between sick and healthy individuals. I am not interested in finding changes over time.
According to the vignette is valid to model differential expression by grouping variables:
df$groups <- paste0(df$status, "time", df$time) model.matrix(~0+groups, df)
I want to do the following 4 contrasts:
groupscondition1time1 vs groupshealthytime1 groupscondition2time1 vs groupshealthytime1 groupscondition1time2 vs groupshealthytime2 groupscondition2time2 vs groupshealthytime2
My question is: should I correct for multple testing across contrasts?
The vignette indicates the following:
"Regarding multiple test correction, if a user is planning to contrast all pairs of many levels, and then selectively reporting the results of only a subset of those pairs, one needs to perform multiple testing across contrasts as well as genes to control for this additional form of multiple testing"
This is not exactly my case, as I have planed the contrasts, but I want to make sure.
Thanks in advance!
The response says:
If you control the FDR at a given level for each of 5 contrasts separately, then you have automatically controlled the FDR at the same level for all 5 contrasts together.
I think is contradictory to what is writen in the DESeq2 Vignette (text cited in the initial post).
I am no expert on multiple test correction, but I would tend to listen to Gordon Smyth, considering it says this in the source code for
p.adjust
:Anyway, my simplistic understanding is this; the BH method in
p.adjust
controls the maximum expected FDR to a certain percentage (say 5%). So if you have a set of p-values and you use the 'BH' method ofp.adjust
, then at most you expect 5% of the genes with an FDR < 0.05 to be false positives. If you have, say, 4 other sets of contrasts, and for those you have also used BH, then the maximum proportion of false positives for each contrast is also 5%.If you then consider all five contrasts together, by definition you have controlled at a 5% FDR, because if there are at most 5% false positives in each contrast, when you pile them all together, there is still at most 5% false positives.
Numerically, consider five contrasts with 100 significant genes, at 5% FDR. At most there are 5 false positives in each contrast. And in total there are 500 significant genes, with at most 25 false positives, which is still a 5% FDR.
I should probably re-write the section of the vignette, because my concern is at the least overstated. What I was concerned about when I wrote that is systematic cherry-picking of contrasts without presenting the actual procedure being used. I've seen code examples doing loops across pairwise comparisons with dozens of groups of samples. 20 groups gives 190 pairs, 30 groups gives 435 pairs, and so on. A simulation shows you can eke out a single-digit number of significant "genes" for cherry-picked contrasts this way. With the run below, you might see "Interestingly, we found that X vs Y resulted in 3 significant genes with E(FDR) < 10%."
It's not a big concern though, as you can't easily get above single digits of genes reported this way (here using actual Uniform R.V.).