Question

limma results versus topTable

0

Entering edit mode

Anne Deslattes Mays ▴ 10

@anne-deslattes-mays-5991

Last seen 7.2 years ago

United States

Hello,

I'm trying to create a reduced matrix after running lmFit, contrasts.fit and eBayes from limma. To do so, ideally the rownames of the results match the rownames of the original matrix.

For example, this is the case with topTable:

>CasevsCtrl.top100 = topTable(iefit,coef=1,n=100,sort="p",p=0.05)[,-1]
>head(CasevsCtrl.top100)
        AveExpr         t      P.Value    adj.P.Val        B
20876 0.8908066 -22.45433 2.598162e-93 2.307428e-89 201.9057
20881 0.8300441 -21.67655 5.092048e-88 2.261124e-84 189.7929
44967 0.9034543 -19.07309 6.212197e-71 1.839017e-67 150.6919
57087 0.9278437 -18.70029 1.411561e-68 3.134017e-65 145.3004
57078 0.9373613 -18.54277 1.373644e-67 2.439866e-64 143.0397
57079 0.9444422 -18.36227 1.839180e-66 2.722293e-63 140.4619

The rownames for this results allow subsetting from the original matrix. However, if one would rather have sorted extremes from the top under and top over expressed -- I use the results table -- which does not preserve the original rownames from the original matrix, making subsetting a little trickier.

>results = decideTests(iefit,adjust="BH",p=0.05)

To get the relative under expressing results -- one looks for the results in the column of interest == -1. To get the relative over expressing results -- one looks for the results in the column of interest == 1. However, now the rownames are not preserved relative to the original matrix. See below:

> head(results[results[,1] == -1,])

    Contrasts
      CasevsCtrl 
  11             -1   
  15             -1   
  17             -1   
  27             -1   
  165            -1  
  167            -1  
> head(results[results[,1] == 1,])

    Contrasts
      CasevsCtrl 
  141             1   
  143             1   
  175             1   
  182             1   
  207             1  
  218             1

Typically I would like to use the rownames as a means to subset a matrix with the most significant differences from the original matrix -- but this is hampered if the rownames are not preserved. My question is -- am I doing this wrong? Is there another way to subset?

Thanks in advance,

Anne

> sessionInfo()

R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] circlize_0.4.1              ComplexHeatmap_1.12.0      
 [3] RTCGA.clinical_20151101.4.0 RTCGA_1.4.0                
 [5] edgeR_3.16.5                limma_3.30.13              
 [7] plyr_1.8.4                  rtracklayer_1.34.2         
 [9] GenomicRanges_1.26.4        GenomeInfoDb_1.10.3        
[11] AnnotationDbi_1.36.2        IRanges_2.8.2              
[13] S4Vectors_0.12.2            Biobase_2.34.0             
[15] AnnotationHub_2.6.5         BiocGenerics_0.20.0        
[17] readr_1.1.1                 stringr_1.2.0              
[19] tidyr_0.7.2                 reshape2_1.4.2             
[21] ggplot2_2.2.1               BiocInstaller_1.24.0       

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2              rjson_0.2.15                 
 [3] class_7.3-14                  modeltools_0.2-21            
 [5] mclust_5.3                    XVector_0.14.1               
 [7] GlobalOptions_0.0.12          ggpubr_0.1.5                 
 [9] flexmix_2.3-14                bit64_0.9-7                  
[11] interactiveDisplayBase_1.12.0 mvtnorm_1.0-6                
[13] xml2_1.1.1                    splines_3.3.3                
[15] mnormt_1.5-5                  robustbase_0.92-7            
[17] knitr_1.17                    Rsamtools_1.26.2             
[19] broom_0.4.2                   km.ci_0.5-2                  
[21] kernlab_0.9-25                cluster_2.0.6                
[23] shiny_1.0.5                   httr_1.3.1                   
[25] assertthat_0.2.0              Matrix_1.2-11                
[27] lazyeval_0.2.0                htmltools_0.3.6              
[29] tools_3.3.3                   bindrcpp_0.2                 
[31] gtable_0.2.0                  glue_1.1.1                   
[33] dplyr_0.7.4                   ggthemes_3.4.0               
[35] Rcpp_0.12.13                  trimcluster_0.1-2            
[37] Biostrings_2.42.1             nlme_3.1-131                 
[39] fpc_2.1-10                    psych_1.7.8                  
[41] rvest_0.3.2                   mime_0.5                     
[43] XML_3.98-1.9                  dendextend_1.5.2             
[45] DEoptimR_1.0-8                zlibbioc_1.20.0              
[47] MASS_7.3-47                   zoo_1.8-0                    
[49] scales_0.5.0                  BSgenome_1.42.0              
[51] hms_0.3                       SummarizedExperiment_1.4.0   
[53] RColorBrewer_1.1-2            yaml_2.1.14                  
[55] curl_3.0                      memoise_1.1.0                
[57] gridExtra_2.3                 KMsurv_0.1-5                 
[59] stringi_1.1.5                 RSQLite_2.0                  
[61] BiocParallel_1.8.2            shape_1.4.3                  
[63] rlang_0.1.2                   pkgconfig_2.0.1              
[65] prabclus_2.2-6                bitops_1.0-6                 
[67] lattice_0.20-35               purrr_0.2.4                  
[69] bindr_0.1                     GenomicAlignments_1.10.1     
[71] cmprsk_2.2-7                  bit_1.1-12                   
[73] magrittr_1.5                  R6_2.2.2                     
[75] DBI_0.7                       whisker_0.3-2                
[77] foreign_0.8-69                survival_2.41-3              
[79] RCurl_1.95-4.8                nnet_7.3-12                  
[81] tibble_1.3.4                  survMisc_0.5.4               
[83] viridis_0.4.0                 GetoptLong_0.1.6             
[85] locfit_1.5-9.1                data.table_1.10.4-2          
[87] blob_1.1.0                    digest_0.6.12                
[89] diptest_0.75-7                xtable_1.8-2                 
[91] httpuv_1.3.5                  munsell_0.4.3                
[93] viridisLite_0.2.0             survminer_0.4.0

limma • 1.1k views

ADD COMMENT • link updated 7.2 years ago by Gordon Smyth 52k • written 7.2 years ago by Anne Deslattes Mays ▴ 10

0

Entering edit mode

Okay -- I am wrong the rownames are in fact preserved. However, I still do not get what I want -- what I want is to have the rows that are most significantly over and under expressed -- the results gives the breakdown, but they are not sorted by p-value -- so I imagine I can do this. But advice as to the most elegant way to approach this is appreciated.

ADD REPLY • link 7.2 years ago Anne Deslattes Mays ▴ 10

score 1 · Answer 1 · 2017-10-27

Using the example data from ?lmFit, we get an MArrayLM object called 'fit'.

> alldat <- topTable(fit,2,Inf)
> rslt <- decideTests(fit)
> ups <- row.names(rslt)[rslt[,2] > 0]
> dns <- row.names(rslt)[rslt[,2] < 0]
> alldat[row.names(alldat) %in% ups,]
       logFC AveExpr    t  P.Value adj.P.Val    B
Gene 1  2.07   0.818 9.19 9.40e-06   0.00094 4.04
Gene 2  1.79   1.328 7.12 6.73e-05   0.00336 1.97

There are no down-regulated genes, but you get the point.

score 0 · Answer 2 · 2017-10-29

James has given a good way using row names. But there are lots of ways to do this without using row names. Why not simply:

tab <- topTable(iefit,coef=1,n=Inf,sort="p",p=0.05)
tabUp <- tab[tab$logFC>0,]
tabDn <- tab[tab$logFC<0,]

Or else using decideTests:

tab <- topTable(iefit,coef=1,n=Inf,sort="none")
results = decideTests(iefit,adjust="BH",p=0.05)
tabUp <- tab[results[,1]<0,]
tabDn <- tab[results[,1]>0,]