Question

Interpretation of adjusted p value from topTable() in paired LIMMA analysis

0

Entering edit mode

Lisa.Hopcroft • 0

@lisahopcroft-8456

Last seen 8.8 years ago

Dear all,

First of all: I apologise if this question has been answered before. Please feel free to direct me elsewhere.

I have a dataset that I'm analysing using a paired (SibShip type) model. I'd like to ask how I can interpret the adjusted p values from the complete topTable() command. And how/why those adjusted p values may differ from those generated by topTable() when specifying the coefficient.

A minimal working example and sessionInfo() are shown below. To clarify with respect to this specific example, I'd like to know what the values of topTable(fit)$adj.P.Val mean (in the context of a paired analysis) and why the values of topTable(fit,coef="treatment")$adj.P.Val are different.

Thanks and best wishes,

Lisa

set.seed(5482)

sd <- 0.3*sqrt(4/rchisq(100,df=4))
y <- matrix(rnorm(100*6,sd=sd),100,6)
rownames(y) <- paste("Gene",1:100)
y[1:2,4:6] <- y[1:2,4:6] + 2

design <- cbind(sample=c(0,1,0,1,0,1),treatment=c(0,0,0,1,1,1))

fit <- lmFit(y,design)
fit <- eBayes(fit)

topTable(fit)
topTable(fit,coef="treatment")

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] edgeR_3.8.6                Rsubread_1.16.1            pd.hugene.1.0.st.v1_3.10.0
 [4] oligo_1.30.0               Biostrings_2.34.1          XVector_0.6.0             
 [7] oligoClasses_1.28.0        hgu133plus2cdf_2.15.0      hgu133acdf_2.15.0         
[10] readxl_0.1.0               affy_1.44.0                limma_3.22.7              
[13] org.Hs.eg.db_3.0.0         RSQLite_1.0.0              DBI_0.3.1                 
[16] GSEABase_1.28.0            graph_1.44.1               annotate_1.44.0           
[19] XML_3.98-1.1               AnnotationDbi_1.28.2       GenomeInfoDb_1.2.5        
[22] IRanges_2.0.1              S4Vectors_0.4.0            Biobase_2.26.0            
[25] BiocGenerics_0.12.1        biomaRt_2.22.0             RColorBrewer_1.1-2        
[28] gplots_2.17.0              ggplot2_1.0.1              reshape_0.8.5             

loaded via a namespace (and not attached):
 [1] affxparser_1.38.0     affyio_1.34.0         BiocInstaller_1.16.5  bit_1.1-12           
 [5] bitops_1.0-6          caTools_1.17.1        codetools_0.2-11      colorspace_1.2-6     
 [9] digest_0.6.8          ff_2.2-13             foreach_1.4.2         gdata_2.16.1         
[13] GenomicRanges_1.18.4  grid_3.1.0            gtable_0.1.2          gtools_3.4.2         
[17] iterators_1.0.7       KernSmooth_2.23-14    labeling_0.3          magrittr_1.5         
[21] MASS_7.3-40           munsell_0.4.2         plyr_1.8.2            preprocessCore_1.28.0
[25] proto_0.3-10          Rcpp_0.11.6           RCurl_1.95-4.6        reshape2_1.4.1       
[29] scales_0.2.4          splines_3.1.0         stringi_0.4-1         stringr_1.0.0        
[33] tools_3.1.0           xtable_1.7-4          zlibbioc_1.12.0

limma • 2.3k views

ADD COMMENT • link updated 8.8 years ago by James W. MacDonald 66k • written 8.8 years ago by Lisa.Hopcroft • 0

score 4 · Accepted Answer · 2015-07-23

UsIng limma, when you don't specify the coefficient the default is to do an F-test on all the coefficients in the model. This is not something of interest in a paired design (the patient-level coefficients are 'nuisance' variables that you are estimating in order to account for patient-level differences, but they are not usually of interest).

This is not (AFAICT) something that is pointed out in the help page for topTable(), but it isn't too hard to deduce by looking at the output.

> topTable(fit)
          Grp1 Grp2vs1 AveExpr     F  P.Value adj.P.Val
Gene 1  -0.195   2.048   0.829 91.26 2.19e-06  0.000219
Gene 25  0.645  -1.588  -0.149 17.38 1.07e-03  0.040036
Gene 2  -0.122   1.810   0.783 16.79 1.20e-03  0.040036
Gene 84  0.903  -1.011   0.398 10.37 5.50e-03  0.137402
Gene 8   0.718  -0.819   0.309  9.36 7.42e-03  0.148379
Gene 12 -1.273   3.239   0.347  7.94 1.18e-02  0.158054
Gene 73  0.276   0.419   0.485  7.81 1.23e-02  0.158054
Gene 62  0.568  -0.735   0.201  7.74 1.26e-02  0.158054
Gene 23 -0.147  -0.482  -0.388  6.55 1.96e-02  0.217787
Gene 35  0.353   0.106   0.406  5.49 3.01e-02  0.278705
> topTable(fit, coef = 2)
         logFC AveExpr     t  P.Value adj.P.Val      B
Gene 1   2.048  0.8286 10.50 4.38e-06  0.000438  4.816
Gene 25 -1.588 -0.1494 -5.79 3.52e-04  0.017620  0.152
Gene 2   1.810  0.7827  4.38 2.13e-03  0.071030 -1.768
Gene 12  3.239  0.3467  3.90 4.23e-03  0.105737 -2.493
Gene 84 -1.011  0.3977 -3.58 6.74e-03  0.116657 -2.983
Gene 8  -0.819  0.3087 -3.46 8.13e-03  0.116657 -3.178
Gene 62 -0.735  0.2007 -3.45 8.17e-03  0.116657 -3.183
Gene 22 -0.698 -0.0224 -2.68 2.70e-02  0.337142 -4.413
Gene 17  0.617 -0.1776  2.29 5.01e-02  0.557193 -5.032
Gene 38  0.787 -0.2097  2.18 6.01e-02  0.558799 -5.209

Note that in the first case you get columns that match the column names of your design matrix, and there is an F column (implying F-statistic), whereas if you specify a coefficient you just get the logFC and AveExpr columns and a column labeled t (for t-statistic).

If you don't specify a coefficient for topTable(), the p-value is associated with the hypothesis test that any of the coefficients are different from zero, whereas when you do specify a (single) coefficient, you are testing that single coefficient.