Question

Moderated F-statistics for time course experiment

0

Entering edit mode

Perry Moerland ▴ 130

@perry-moerland-1109

Last seen 3.1 years ago

Bioinformatics Laboratory, Academic Med…

Dear all,

In an RNA-seq time course experiment I was analyzing I was expecting identical moderated F-statistics (calculated using topTable) for a set of contrasts where each timepoint is compared with baseline and the 'equivalent' set of contrasts where each timepoint is compared with the previous timepoint. However, the F-statistics and therefore the corresponding (adjusted) p-values turned out to be different although the ranking of the genes was mostly similar. I can reproduce this using the fission data package, which contains gene counts for an RNA-seq time course of fission yeast (Leong et al. 2014).

library(edgeR)
library(limma)
library(fission)
data(fission)

dge <- DGEList(counts=assay(fission), samples=data.frame(colData(fission)), genes=data.frame(rowData(fission)))
dge <- calcNormFactors(dge)

treatment <- factor(paste(dge$samples$strain, dge$samples$minute, sep="_"))
design <- model.matrix(~0+treatment)
colnames(design) <- sub("treatment", "", colnames(design))

v <- voom(dge, design)
fit <- lmFit(v, design)
# Define contrasts for the first three timepoints in the mutant strain
my.contrasts <- makeContrasts(
  mut15_vs_0 = mut_15 - mut_0,
  mut30_vs_0 = mut_30 - mut_0,
  mut30_vs_15 = mut_30 - mut_15,
  levels = design
)
fit2 <- contrasts.fit(fit,my.contrasts)
fit.eb <- eBayes(fit2)

Since each contrast is completely defined by the other two contrasts I would have expected identical results when using the first two contrasts, i.e comparing 15 and 30 minutes with time 0 (Option I):

topTable(fit.eb, coef=1:2, n=2)

and when using the first and the third contrast, i.e. comparing each timepoint with the previous timepoint (Option II):

topTable(fit.eb, coef=c(1,3), n=2)

This is also stated in section 9.6.1 of the limma user's guide: "Any two contrasts between the three times would give the same result.".

However, output for Option I is:

symbol biotype mut15_vs_0 mut30_vs_0 AveExpr F P.Value adj.P.Val
SPBC365.12c ish1 protein_coding 3.310655 5.499713 6.789024 562.4327 1.537399e-23 6.33079e-20
SPBC4B4.08 ght2 protein_coding 2.616911 4.039343 8.511128 544.2622 2.420539e-23 6.33079e-20

and for Option II:

                              symbol                 biotype        mut15_vs_0 mut30_vs_15 AveExpr       F            P.Value           adj.P.Val
SPBC365.12c          ish1              protein_coding    3.310655    2.189059      6.789024 662.9574 1.575561e-24 1.109037e-20
SPCPB16A4.07 SPCPB16A4.07 protein_coding   5.659329    1.083013      5.525583 615.4990 4.413496e-24 1.553330e-20

Am I overlooking something? Any help would be appreciated.

Best,

Perry

My session info:

Session info -----------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   x86_64, mingw32             
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  English_United Kingdom.1252 
 tz       Europe/Berlin               
 date     2018-01-31                  

Packages ---------------------------------------------------------------------------------------------------
 package              * version   date       source        
 base                 * 3.4.3     2017-12-06 local         
 Biobase              * 2.38.0    2017-10-31 Bioconductor  
 BiocGenerics         * 0.24.0    2017-10-31 Bioconductor  
 bitops                 1.0-6     2013-08-17 CRAN (R 3.4.1)
 compiler               3.4.3     2017-12-06 local         
 datasets             * 3.4.3     2017-12-06 local         
 DelayedArray         * 0.4.1     2017-11-04 Bioconductor  
 devtools             * 1.13.4    2017-11-09 CRAN (R 3.4.3)
 digest                 0.6.14    2018-01-14 CRAN (R 3.4.3)
 edgeR                * 3.20.7    2018-01-18 Bioconductor  
 fission              * 0.112.0   2018-01-30 Bioconductor  
 GenomeInfoDb         * 1.14.0    2017-10-31 Bioconductor  
 GenomeInfoDbData       1.0.0     2018-01-23 Bioconductor  
 GenomicRanges        * 1.30.0    2017-10-31 Bioconductor  
 graphics             * 3.4.3     2017-12-06 local         
 grDevices            * 3.4.3     2017-12-06 local         
 grid                   3.4.3     2017-12-06 local         
 IRanges              * 2.12.0    2017-10-31 Bioconductor  
 lattice                0.20-35   2017-03-25 CRAN (R 3.4.3)
 limma                * 3.34.5    2017-12-23 Bioconductor  
 locfit                 1.5-9.1   2013-04-20 CRAN (R 3.4.3)
 Matrix                 1.2-12    2017-11-20 CRAN (R 3.4.3)
 matrixStats          * 0.53.0    2018-01-24 CRAN (R 3.4.3)
 memoise                1.1.0     2017-04-21 CRAN (R 3.4.3)
 methods              * 3.4.3     2017-12-06 local         
 parallel             * 3.4.3     2017-12-06 local         
 Rcpp                   0.12.14   2017-11-23 CRAN (R 3.4.3)
 RCurl                  1.95-4.10 2018-01-04 CRAN (R 3.4.3)
 rstudioapi             0.7       2017-09-07 CRAN (R 3.4.3)
 S4Vectors            * 0.16.0    2017-10-31 Bioconductor  
 stats                * 3.4.3     2017-12-06 local         
 stats4               * 3.4.3     2017-12-06 local         
 SummarizedExperiment * 1.8.1     2017-12-19 Bioconductor  
 tools                  3.4.3     2017-12-06 local         
 utils                * 3.4.3     2017-12-06 local         
 withr                  2.1.1     2017-12-19 CRAN (R 3.4.3)
 XVector                0.18.0    2017-10-31 Bioconductor  
 yaml                   2.1.16    2017-12-12 CRAN (R 3.4.3)
 zlibbioc               1.24.0    2017-10-31 Bioconductor

limma rnaseq • 1.0k views

ADD COMMENT • link updated 7.2 years ago by Gordon Smyth 52k • written 7.2 years ago by Perry Moerland ▴ 130

score 2 · Accepted Answer · 2018-01-31

No, you haven't overlooked anything. What the limma Guide says is generally correct, as we can see using the limma-trend pipeline for the same data:

> y <- new("EList")
> y$E <- cpm(dge,log=TRUE,prior=3)
> y$genes <- dge$genes
> fitt <- lmFit(y,design)
> fitt <- contrasts.fit(fitt, my.contrasts)
> fitt <- eBayes(fitt,trend=TRUE)

> topTable(fitt,coef=c(1,2))
                   symbol        biotype mut15_vs_0 mut30_vs_0 AveExpr   F  P.Value adj.P.Val
SPBC365.12c          ish1 protein_coding       3.32     5.5106    6.79 722 7.48e-23  5.27e-19
SPBC4B4.08           ght2 protein_coding       2.62     4.0410    8.51 587 9.41e-22  3.31e-18
SPCPB16A4.07 SPCPB16A4.07 protein_coding       5.66     6.7439    5.54 477 1.19e-20  2.80e-17
SPBC21C3.19   SPBC21C3.19 protein_coding       5.40     6.7760    6.73 425 4.79e-20  8.44e-17
SPAC26F1.04c         etr1 protein_coding       2.15     3.7243    5.71 386 1.54e-19  2.16e-16
SPAC22F8.05   SPAC22F8.05 protein_coding       3.63     5.4667    7.34 308 2.36e-18  2.77e-15
SPAC343.19           lsb6 protein_coding      -1.53     0.0272    4.59 302 2.97e-18  2.99e-15
SPAC11E3.14   SPAC11E3.14 protein_coding       1.52     2.5936    6.03 298 3.49e-18  3.07e-15
SPCC794.04c   SPCC794.04c protein_coding       5.41     6.9769    6.01 289 5.09e-18  3.48e-15
SPBC660.07           ntp1 protein_coding       1.83     3.1229    6.87 289 5.12e-18  3.48e-15
> topTable(fitt,coef=c(1,3))
                   symbol        biotype mut15_vs_0 mut30_vs_15 AveExpr   F  P.Value adj.P.Val
SPBC365.12c          ish1 protein_coding       3.32        2.19    6.79 722 7.48e-23  5.27e-19
SPBC4B4.08           ght2 protein_coding       2.62        1.42    8.51 587 9.41e-22  3.31e-18
SPCPB16A4.07 SPCPB16A4.07 protein_coding       5.66        1.08    5.54 477 1.19e-20  2.80e-17
SPBC21C3.19   SPBC21C3.19 protein_coding       5.40        1.38    6.73 425 4.79e-20  8.44e-17
SPAC26F1.04c         etr1 protein_coding       2.15        1.58    5.71 386 1.54e-19  2.16e-16
SPAC22F8.05   SPAC22F8.05 protein_coding       3.63        1.83    7.34 308 2.36e-18  2.77e-15
SPAC343.19           lsb6 protein_coding      -1.53        1.56    4.59 302 2.97e-18  2.99e-15
SPAC11E3.14   SPAC11E3.14 protein_coding       1.52        1.07    6.03 298 3.49e-18  3.07e-15
SPCC794.04c   SPCC794.04c protein_coding       5.41        1.56    6.01 289 5.09e-18  3.48e-15
SPBC660.07           ntp1 protein_coding       1.83        1.30    6.87 289 5.12e-18  3.48e-15

However, voom produces observation weights, and limma makes some approximations for speed when there are weights. So, when you are using the voom, the two F-tests are not exactly equal any more (even though they theoretically should be).