Dear all,
In an RNA-seq time course experiment I was analyzing I was expecting identical moderated F-statistics (calculated using topTable
) for a set of contrasts where each timepoint is compared with baseline and the 'equivalent' set of contrasts where each timepoint is compared with the previous timepoint. However, the F-statistics and therefore the corresponding (adjusted) p-values turned out to be different although the ranking of the genes was mostly similar. I can reproduce this using the fission data package, which contains gene counts for an RNA-seq time course of fission yeast (Leong et al. 2014).
library(edgeR) library(limma) library(fission) data(fission) dge <- DGEList(counts=assay(fission), samples=data.frame(colData(fission)), genes=data.frame(rowData(fission))) dge <- calcNormFactors(dge) treatment <- factor(paste(dge$samples$strain, dge$samples$minute, sep="_")) design <- model.matrix(~0+treatment) colnames(design) <- sub("treatment", "", colnames(design)) v <- voom(dge, design) fit <- lmFit(v, design) # Define contrasts for the first three timepoints in the mutant strain my.contrasts <- makeContrasts( mut15_vs_0 = mut_15 - mut_0, mut30_vs_0 = mut_30 - mut_0, mut30_vs_15 = mut_30 - mut_15, levels = design ) fit2 <- contrasts.fit(fit,my.contrasts) fit.eb <- eBayes(fit2)
Since each contrast is completely defined by the other two contrasts I would have expected identical results when using the first two contrasts, i.e comparing 15 and 30 minutes with time 0 (Option I):
topTable(fit.eb, coef=1:2, n=2)
and when using the first and the third contrast, i.e. comparing each timepoint with the previous timepoint (Option II):
topTable(fit.eb, coef=c(1,3), n=2)
This is also stated in section 9.6.1 of the limma user's guide: "Any two contrasts between the three times would give the same result.".
However, output for Option I is:
symbol biotype mut15_vs_0 mut30_vs_0 AveExpr F P.Value adj.P.Val
SPBC365.12c ish1 protein_coding 3.310655 5.499713 6.789024 562.4327 1.537399e-23 6.33079e-20
SPBC4B4.08 ght2 protein_coding 2.616911 4.039343 8.511128 544.2622 2.420539e-23 6.33079e-20
and for Option II:
symbol biotype mut15_vs_0 mut30_vs_15 AveExpr F P.Value adj.P.Val
SPBC365.12c ish1 protein_coding 3.310655 2.189059 6.789024 662.9574 1.575561e-24 1.109037e-20
SPCPB16A4.07 SPCPB16A4.07 protein_coding 5.659329 1.083013 5.525583 615.4990 4.413496e-24 1.553330e-20
Am I overlooking something? Any help would be appreciated.
Best,
Perry
My session info:
Session info ----------------------------------------------------------------------------------------------- setting value version R version 3.4.3 (2017-11-30) system x86_64, mingw32 ui RStudio (1.1.383) language (EN) collate English_United Kingdom.1252 tz Europe/Berlin date 2018-01-31 Packages --------------------------------------------------------------------------------------------------- package * version date source base * 3.4.3 2017-12-06 local Biobase * 2.38.0 2017-10-31 Bioconductor BiocGenerics * 0.24.0 2017-10-31 Bioconductor bitops 1.0-6 2013-08-17 CRAN (R 3.4.1) compiler 3.4.3 2017-12-06 local datasets * 3.4.3 2017-12-06 local DelayedArray * 0.4.1 2017-11-04 Bioconductor devtools * 1.13.4 2017-11-09 CRAN (R 3.4.3) digest 0.6.14 2018-01-14 CRAN (R 3.4.3) edgeR * 3.20.7 2018-01-18 Bioconductor fission * 0.112.0 2018-01-30 Bioconductor GenomeInfoDb * 1.14.0 2017-10-31 Bioconductor GenomeInfoDbData 1.0.0 2018-01-23 Bioconductor GenomicRanges * 1.30.0 2017-10-31 Bioconductor graphics * 3.4.3 2017-12-06 local grDevices * 3.4.3 2017-12-06 local grid 3.4.3 2017-12-06 local IRanges * 2.12.0 2017-10-31 Bioconductor lattice 0.20-35 2017-03-25 CRAN (R 3.4.3) limma * 3.34.5 2017-12-23 Bioconductor locfit 1.5-9.1 2013-04-20 CRAN (R 3.4.3) Matrix 1.2-12 2017-11-20 CRAN (R 3.4.3) matrixStats * 0.53.0 2018-01-24 CRAN (R 3.4.3) memoise 1.1.0 2017-04-21 CRAN (R 3.4.3) methods * 3.4.3 2017-12-06 local parallel * 3.4.3 2017-12-06 local Rcpp 0.12.14 2017-11-23 CRAN (R 3.4.3) RCurl 1.95-4.10 2018-01-04 CRAN (R 3.4.3) rstudioapi 0.7 2017-09-07 CRAN (R 3.4.3) S4Vectors * 0.16.0 2017-10-31 Bioconductor stats * 3.4.3 2017-12-06 local stats4 * 3.4.3 2017-12-06 local SummarizedExperiment * 1.8.1 2017-12-19 Bioconductor tools 3.4.3 2017-12-06 local utils * 3.4.3 2017-12-06 local withr 2.1.1 2017-12-19 CRAN (R 3.4.3) XVector 0.18.0 2017-10-31 Bioconductor yaml 2.1.16 2017-12-12 CRAN (R 3.4.3) zlibbioc 1.24.0 2017-10-31 Bioconductor
Dear Gordon,
I was already suspecting that voom was causing the observed differences. Thanks for confirming this and showing that everything is indeed as expected with limma-trend. What would your recommendation be when using the voom pipeline and having to choose between three 'equivalent' sets of contrasts (coef=c(1,2), coef=c(1,3), coef=c(2,3)) for calculating the F-statistics as in the example above? It is very likely that each set of contrasts will lead to a slightly different list of genes for a specific cut-off on the adjusted P-value. Do you consider these differences negligible in general?