Search
Question: limma results versus topTable
0
11 months ago by
United States
Anne Deslattes Mays10 wrote:

Hello,

I'm trying to create a reduced matrix after running lmFit, contrasts.fit and eBayes from limma.   To do so, ideally the rownames of the results match the rownames of the original matrix.

For example, this is the case with topTable:

>CasevsCtrl.top100 = topTable(iefit,coef=1,n=100,sort="p",p=0.05)[,-1]
20876 0.8908066 -22.45433 2.598162e-93 2.307428e-89 201.9057
20881 0.8300441 -21.67655 5.092048e-88 2.261124e-84 189.7929
44967 0.9034543 -19.07309 6.212197e-71 1.839017e-67 150.6919
57087 0.9278437 -18.70029 1.411561e-68 3.134017e-65 145.3004
57078 0.9373613 -18.54277 1.373644e-67 2.439866e-64 143.0397
57079 0.9444422 -18.36227 1.839180e-66 2.722293e-63 140.4619

The rownames for this results allow subsetting from the original matrix.   However, if one would rather have sorted extremes from the top under and top over expressed -- I use the results table -- which does not preserve the original rownames from the original matrix, making subsetting a little trickier.

>results = decideTests(iefit,adjust="BH",p=0.05)


To get the relative under expressing results -- one looks for the results in the column of interest == -1.  To get the relative over expressing results -- one looks for the results in the column of interest == 1.  However, now the rownames are not preserved relative to the original matrix. See below:

> head(results[results[,1] == -1,])

Contrasts
CasevsCtrl
11             -1
15             -1
17             -1
27             -1
165            -1
167            -1
> head(results[results[,1] == 1,])
    Contrasts
CasevsCtrl
141             1
143             1
175             1
182             1
207             1
218             1  

Typically I would like to use the rownames as a means to subset a matrix with the most significant differences from the original matrix -- but this is hampered if the rownames are not preserved.  My question is -- am I doing this wrong?  Is there another way to subset?

Anne

> sessionInfo()

R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats4    parallel  stats     graphics  grDevices utils
[8] datasets  methods   base

other attached packages:
[1] circlize_0.4.1              ComplexHeatmap_1.12.0
[3] RTCGA.clinical_20151101.4.0 RTCGA_1.4.0
[5] edgeR_3.16.5                limma_3.30.13
[7] plyr_1.8.4                  rtracklayer_1.34.2
[9] GenomicRanges_1.26.4        GenomeInfoDb_1.10.3
[11] AnnotationDbi_1.36.2        IRanges_2.8.2
[13] S4Vectors_0.12.2            Biobase_2.34.0
[15] AnnotationHub_2.6.5         BiocGenerics_0.20.0
[19] tidyr_0.7.2                 reshape2_1.4.2
[21] ggplot2_2.2.1               BiocInstaller_1.24.0

loaded via a namespace (and not attached):
[1] colorspace_1.3-2              rjson_0.2.15
[3] class_7.3-14                  modeltools_0.2-21
[5] mclust_5.3                    XVector_0.14.1
[7] GlobalOptions_0.0.12          ggpubr_0.1.5
[9] flexmix_2.3-14                bit64_0.9-7
[11] interactiveDisplayBase_1.12.0 mvtnorm_1.0-6
[13] xml2_1.1.1                    splines_3.3.3
[15] mnormt_1.5-5                  robustbase_0.92-7
[17] knitr_1.17                    Rsamtools_1.26.2
[19] broom_0.4.2                   km.ci_0.5-2
[21] kernlab_0.9-25                cluster_2.0.6
[23] shiny_1.0.5                   httr_1.3.1
[25] assertthat_0.2.0              Matrix_1.2-11
[27] lazyeval_0.2.0                htmltools_0.3.6
[29] tools_3.3.3                   bindrcpp_0.2
[31] gtable_0.2.0                  glue_1.1.1
[33] dplyr_0.7.4                   ggthemes_3.4.0
[35] Rcpp_0.12.13                  trimcluster_0.1-2
[37] Biostrings_2.42.1             nlme_3.1-131
[39] fpc_2.1-10                    psych_1.7.8
[41] rvest_0.3.2                   mime_0.5
[43] XML_3.98-1.9                  dendextend_1.5.2
[45] DEoptimR_1.0-8                zlibbioc_1.20.0
[47] MASS_7.3-47                   zoo_1.8-0
[49] scales_0.5.0                  BSgenome_1.42.0
[51] hms_0.3                       SummarizedExperiment_1.4.0
[53] RColorBrewer_1.1-2            yaml_2.1.14
[55] curl_3.0                      memoise_1.1.0
[57] gridExtra_2.3                 KMsurv_0.1-5
[59] stringi_1.1.5                 RSQLite_2.0
[61] BiocParallel_1.8.2            shape_1.4.3
[63] rlang_0.1.2                   pkgconfig_2.0.1
[65] prabclus_2.2-6                bitops_1.0-6
[67] lattice_0.20-35               purrr_0.2.4
[69] bindr_0.1                     GenomicAlignments_1.10.1
[71] cmprsk_2.2-7                  bit_1.1-12
[73] magrittr_1.5                  R6_2.2.2
[75] DBI_0.7                       whisker_0.3-2
[77] foreign_0.8-69                survival_2.41-3
[79] RCurl_1.95-4.8                nnet_7.3-12
[81] tibble_1.3.4                  survMisc_0.5.4
[83] viridis_0.4.0                 GetoptLong_0.1.6
[85] locfit_1.5-9.1                data.table_1.10.4-2
[87] blob_1.1.0                    digest_0.6.12
[89] diptest_0.75-7                xtable_1.8-2
[91] httpuv_1.3.5                  munsell_0.4.3
[93] viridisLite_0.2.0             survminer_0.4.0   
modified 11 months ago by Gordon Smyth35k • written 11 months ago by Anne Deslattes Mays10

Okay -- I am wrong the rownames are in fact preserved.  However, I still do not get what I want -- what I want is to have the rows that are most significantly over and under expressed -- the results gives the breakdown, but they are not sorted by p-value -- so I imagine I can do this.   But advice as to the most elegant way to approach this is appreciated.

1
11 months ago by
United States
James W. MacDonald47k wrote:

Using the example data from ?lmFit, we get an MArrayLM object called 'fit'.

> alldat <- topTable(fit,2,Inf)
> rslt <- decideTests(fit)
> ups <- row.names(rslt)[rslt[,2] > 0]
> dns <- row.names(rslt)[rslt[,2] < 0]
> alldat[row.names(alldat) %in% ups,]
logFC AveExpr    t  P.Value adj.P.Val    B
Gene 1  2.07   0.818 9.19 9.40e-06   0.00094 4.04
Gene 2  1.79   1.328 7.12 6.73e-05   0.00336 1.97

There are no down-regulated genes, but you get the point.

0
11 months ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

James has given a good way using row names. But there are lots of ways to do this without using row names. Why not simply:

tab <- topTable(iefit,coef=1,n=Inf,sort="p",p=0.05)
tabUp <- tab[tab$logFC>0,] tabDn <- tab[tab$logFC<0,]


Or else using decideTests:

tab <- topTable(iefit,coef=1,n=Inf,sort="none")
tabDn <- tab[results[,1]>0,]