Hi all,
I am having a problem with edgeR's FDR output for a sample. Specifically, the FDR calculated for the sample makes all but 1 gene greater than 0.999 FDR.
I have 6 samples with 3 biological replicates per sample (except 1 sample which only has 2, not the sample in question). I'm comparing them all to a control sample.
Here is my code snippet
group <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6)) y <- DGEList(counts = data.data, group = group) design <- model.matrix(~0+group) keep <- which(rowMeans(cpm(y)) > 1) y <- y[keep,] y$samples$lib.size <- colSums(y$counts) y <- calcNormFactors(y) y.tagwiseD <- estimateGLMTagwiseDisp(y, design) fit <- flmFit(y.tagwiseD, design) lrt.pa <- glmLRT(fit, contrast=c(-1,1,0,0,0,0)) lrt.pb <- glmLRT(fit, contrast=c(-1,0,1,0,0,0)) lrt.pc <- glmLRT(fit, contrast=c(-1,0,0,1,0,0)) lrt.pd <- glmLRT(fit, contrast=c(-1,0,0,0,1,0)) lrt.pe <- glmLRT(fit, contrast=c(-1,0,0,0,0,1)) tt.pa <- topTags(lrt.pa, n = Inf, sort.by = "PValue") tt.pb <- topTags(lrt.pb, n = Inf, sort.by = "PValue") tt.pc <- topTags(lrt.pc, n = Inf, sort.by = "PValue") tt.pd <- topTags(lrt.pd, n = Inf, sort.by = "PValue") tt.pe <- topTags(lrt.pe, n = Inf, sort.by = "PValue")
​Here is what the library sizes look like
group lib.size norm.factors control.1 1 7748209 1.0210560 control.2 1 7179951 0.9556734 control.3 1 11971447 1.0134466 pa.1 2 10175798 1.0012736 pa.2 2 9470155 1.0016820 pa.3 2 10123124 1.0584135 pb.1 3 7913483 0.9819921 pb.2 3 8295638 0.9916526 pb.3 3 9695925 0.9868686 pc.1 4 8920315 0.9763594 pc.2 4 8218776 0.9714772 pc.3 4 7328867 1.0263048 pd.1 5 7748209 1.0210560 pd.2 5 11961039 0.9783426 pd.3 5 11204776 1.0162369 pe.1 6 9162241 0.9585401 pe.3 6 11191854 1.0464313
Sample pd is the problem sample. Here is what the sample looks like
> head(tt.pd) Coefficient: -1*group1 1*group5 logFC logCPM LR PValue FDR ENSG00000185483.9 -2.0867536 0.8938299 16.18707 5.738457e-05 0.8752869 ENSG00000183508.4 -1.7070970 3.7773429 14.11071 1.723595e-04 0.9996358 ENSG00000162817.6 -1.2346601 2.8174421 12.05522 5.164758e-04 0.9996358 ENSG00000005243.7 -0.6852432 6.4953765 11.99971 5.320874e-04 0.9996358 ENSG00000138944.7 -2.5864985 0.5451848 11.96504 5.420809e-04 0.9996358 ENSG00000158270.10 -2.0259865 3.1257016 11.55704 6.749312e-04 0.9996358
As you can see, the FDR doesn't really match up with the p-value or the logFC. Interestingly, we have a second control sample in the data set which is sample pe and it shows an FDR distribution like one would expect from comparing two controls, specifically, some changed genes but much much less than treated samples. Here is the toptags for sample pe (the other control)
> head(tt.pe) Coefficient: -1*group1 1*group6 logFC logCPM LR PValue FDR ENSG00000248223.1 -1.728649 5.278270 23.85892 1.036605e-06 0.009185926 ENSG00000001617.9 -1.433806 3.550665 23.57005 1.204475e-06 0.009185926 ENSG00000072041.14 -2.454923 2.091251 19.47446 1.019537e-05 0.040531314 ENSG00000128917.6 -1.039870 3.864531 19.39490 1.062907e-05 0.040531314 ENSG00000182752.9 -1.038423 5.551625 18.94674 1.344191e-05 0.041005891 ENSG00000162654.8 1.323367 2.564031 17.59689 2.730339e-05 0.050695072
So to reiterate my question, why is the FDR distribution so strange for sample pd?
Thanks,
David Oliver
A couple of typos: