Moderated F-statistics for time course experiment
1
0
Entering edit mode
@perry-moerland-1109
Last seen 2.0 years ago
Bioinformatics Laboratory, Academic Med…

Dear all,

In an RNA-seq time course experiment I was analyzing I was expecting identical moderated F-statistics (calculated using topTable) for a set of contrasts where each timepoint is compared with baseline and the 'equivalent' set of contrasts where each timepoint is compared with the previous timepoint. However, the F-statistics and therefore the corresponding (adjusted) p-values turned out to be different although the ranking of the genes was mostly similar. I can reproduce this using the fission data package, which contains gene counts for an RNA-seq time course of fission yeast (Leong et al. 2014). 

library(edgeR)
library(limma)
library(fission)
data(fission)

dge <- DGEList(counts=assay(fission), samples=data.frame(colData(fission)), genes=data.frame(rowData(fission)))
dge <- calcNormFactors(dge)

treatment <- factor(paste(dge$samples$strain, dge$samples$minute, sep="_"))
design <- model.matrix(~0+treatment)
colnames(design) <- sub("treatment", "", colnames(design))

v <- voom(dge, design)
fit <- lmFit(v, design)
# Define contrasts for the first three timepoints in the mutant strain
my.contrasts <- makeContrasts(
  mut15_vs_0 = mut_15 - mut_0,
  mut30_vs_0 = mut_30 - mut_0,
  mut30_vs_15 = mut_30 - mut_15,
  levels = design
)
fit2 <- contrasts.fit(fit,my.contrasts)
fit.eb <- eBayes(fit2)

Since each contrast is completely defined by the other two contrasts I would have expected identical results when using the first two contrasts, i.e comparing 15 and 30 minutes with time 0 (Option I):

topTable(fit.eb, coef=1:2, n=2)

and when using the first and the third contrast, i.e. comparing each timepoint with the previous timepoint (Option II):

topTable(fit.eb, coef=c(1,3), n=2)

This is also stated in section 9.6.1 of the limma user's guide: "Any two contrasts between the three times would give the same result.".

However, output for Option I is:

                      symbol        biotype     mut15_vs_0 mut30_vs_0  AveExpr        F            P.Value           adj.P.Val
SPBC365.12c   ish1   protein_coding   3.310655   5.499713    6.789024  562.4327  1.537399e-23  6.33079e-20
SPBC4B4.08    ght2  protein_coding    2.616911   4.039343    8.511128   544.2622  2.420539e-23  6.33079e-20

and for Option II:

                              symbol                 biotype        mut15_vs_0 mut30_vs_15  AveExpr       F            P.Value           adj.P.Val
SPBC365.12c          ish1              protein_coding    3.310655    2.189059      6.789024  662.9574  1.575561e-24  1.109037e-20
SPCPB16A4.07 SPCPB16A4.07  protein_coding   5.659329    1.083013      5.525583  615.4990  4.413496e-24  1.553330e-20

 

Am I overlooking something? Any help would be appreciated.

Best,

Perry

 

My session info:

Session info -----------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   x86_64, mingw32             
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  English_United Kingdom.1252 
 tz       Europe/Berlin               
 date     2018-01-31                  

Packages ---------------------------------------------------------------------------------------------------
 package              * version   date       source        
 base                 * 3.4.3     2017-12-06 local         
 Biobase              * 2.38.0    2017-10-31 Bioconductor  
 BiocGenerics         * 0.24.0    2017-10-31 Bioconductor  
 bitops                 1.0-6     2013-08-17 CRAN (R 3.4.1)
 compiler               3.4.3     2017-12-06 local         
 datasets             * 3.4.3     2017-12-06 local         
 DelayedArray         * 0.4.1     2017-11-04 Bioconductor  
 devtools             * 1.13.4    2017-11-09 CRAN (R 3.4.3)
 digest                 0.6.14    2018-01-14 CRAN (R 3.4.3)
 edgeR                * 3.20.7    2018-01-18 Bioconductor  
 fission              * 0.112.0   2018-01-30 Bioconductor  
 GenomeInfoDb         * 1.14.0    2017-10-31 Bioconductor  
 GenomeInfoDbData       1.0.0     2018-01-23 Bioconductor  
 GenomicRanges        * 1.30.0    2017-10-31 Bioconductor  
 graphics             * 3.4.3     2017-12-06 local         
 grDevices            * 3.4.3     2017-12-06 local         
 grid                   3.4.3     2017-12-06 local         
 IRanges              * 2.12.0    2017-10-31 Bioconductor  
 lattice                0.20-35   2017-03-25 CRAN (R 3.4.3)
 limma                * 3.34.5    2017-12-23 Bioconductor  
 locfit                 1.5-9.1   2013-04-20 CRAN (R 3.4.3)
 Matrix                 1.2-12    2017-11-20 CRAN (R 3.4.3)
 matrixStats          * 0.53.0    2018-01-24 CRAN (R 3.4.3)
 memoise                1.1.0     2017-04-21 CRAN (R 3.4.3)
 methods              * 3.4.3     2017-12-06 local         
 parallel             * 3.4.3     2017-12-06 local         
 Rcpp                   0.12.14   2017-11-23 CRAN (R 3.4.3)
 RCurl                  1.95-4.10 2018-01-04 CRAN (R 3.4.3)
 rstudioapi             0.7       2017-09-07 CRAN (R 3.4.3)
 S4Vectors            * 0.16.0    2017-10-31 Bioconductor  
 stats                * 3.4.3     2017-12-06 local         
 stats4               * 3.4.3     2017-12-06 local         
 SummarizedExperiment * 1.8.1     2017-12-19 Bioconductor  
 tools                  3.4.3     2017-12-06 local         
 utils                * 3.4.3     2017-12-06 local         
 withr                  2.1.1     2017-12-19 CRAN (R 3.4.3)
 XVector                0.18.0    2017-10-31 Bioconductor  
 yaml                   2.1.16    2017-12-12 CRAN (R 3.4.3)
 zlibbioc               1.24.0    2017-10-31 Bioconductor  
limma rnaseq • 802 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia

No, you haven't overlooked anything. What the limma Guide says is generally correct, as we can see using the limma-trend pipeline for the same data:

 

> y <- new("EList")
> y$E <- cpm(dge,log=TRUE,prior=3)
> y$genes <- dge$genes
> fitt <- lmFit(y,design)
> fitt <- contrasts.fit(fitt, my.contrasts)
> fitt <- eBayes(fitt,trend=TRUE)

 

> topTable(fitt,coef=c(1,2))
                   symbol        biotype mut15_vs_0 mut30_vs_0 AveExpr   F  P.Value adj.P.Val
SPBC365.12c          ish1 protein_coding       3.32     5.5106    6.79 722 7.48e-23  5.27e-19
SPBC4B4.08           ght2 protein_coding       2.62     4.0410    8.51 587 9.41e-22  3.31e-18
SPCPB16A4.07 SPCPB16A4.07 protein_coding       5.66     6.7439    5.54 477 1.19e-20  2.80e-17
SPBC21C3.19   SPBC21C3.19 protein_coding       5.40     6.7760    6.73 425 4.79e-20  8.44e-17
SPAC26F1.04c         etr1 protein_coding       2.15     3.7243    5.71 386 1.54e-19  2.16e-16
SPAC22F8.05   SPAC22F8.05 protein_coding       3.63     5.4667    7.34 308 2.36e-18  2.77e-15
SPAC343.19           lsb6 protein_coding      -1.53     0.0272    4.59 302 2.97e-18  2.99e-15
SPAC11E3.14   SPAC11E3.14 protein_coding       1.52     2.5936    6.03 298 3.49e-18  3.07e-15
SPCC794.04c   SPCC794.04c protein_coding       5.41     6.9769    6.01 289 5.09e-18  3.48e-15
SPBC660.07           ntp1 protein_coding       1.83     3.1229    6.87 289 5.12e-18  3.48e-15
> topTable(fitt,coef=c(1,3))
                   symbol        biotype mut15_vs_0 mut30_vs_15 AveExpr   F  P.Value adj.P.Val
SPBC365.12c          ish1 protein_coding       3.32        2.19    6.79 722 7.48e-23  5.27e-19
SPBC4B4.08           ght2 protein_coding       2.62        1.42    8.51 587 9.41e-22  3.31e-18
SPCPB16A4.07 SPCPB16A4.07 protein_coding       5.66        1.08    5.54 477 1.19e-20  2.80e-17
SPBC21C3.19   SPBC21C3.19 protein_coding       5.40        1.38    6.73 425 4.79e-20  8.44e-17
SPAC26F1.04c         etr1 protein_coding       2.15        1.58    5.71 386 1.54e-19  2.16e-16
SPAC22F8.05   SPAC22F8.05 protein_coding       3.63        1.83    7.34 308 2.36e-18  2.77e-15
SPAC343.19           lsb6 protein_coding      -1.53        1.56    4.59 302 2.97e-18  2.99e-15
SPAC11E3.14   SPAC11E3.14 protein_coding       1.52        1.07    6.03 298 3.49e-18  3.07e-15
SPCC794.04c   SPCC794.04c protein_coding       5.41        1.56    6.01 289 5.09e-18  3.48e-15
SPBC660.07           ntp1 protein_coding       1.83        1.30    6.87 289 5.12e-18  3.48e-15

 

However, voom produces observation weights, and limma makes some approximations for speed when there are weights. So, when you are using the voom, the two F-tests are not exactly equal any more (even though they theoretically should be).

ADD COMMENT
0
Entering edit mode

Dear Gordon,

I was already suspecting that voom was causing the observed differences. Thanks for confirming this and showing that everything is indeed as expected with limma-trend. What would your recommendation be when using the voom pipeline and having to choose between three 'equivalent' sets of contrasts (coef=c(1,2), coef=c(1,3), coef=c(2,3)) for calculating the F-statistics as in the example above? It is very likely that each set of contrasts will lead to a slightly different list of genes for a specific cut-off on the adjusted P-value. Do you consider these differences negligible in general?

ADD REPLY

Login before adding your answer.

Traffic: 847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6