Question: [DESEQ2] VST with rlog + normalization (size factor estimation) for downstream (clustering) analysis.
gravatar for john
2.5 years ago by
john40 wrote:

Hello guys.

I am using the rlog transformation from DEseq2 (see first figure) and it produces a strange graph. 1th graph are random numbers. 2nd are the raw counts. 3rd are normalized raw counts etc. 

Question1): In the 5th graph- Why are the red dots so close to the x axis? It should more or less look like the first graph. Any ideas on whats causing this?


After some thinking I went on and reduced my data keeping only genes that are expressed in among ALL samples (second figure).

Here rlog does what its supposed to do.





Question2) While we are at it- Is this the correct order for normalization (estimateSizeFactors) + VST(variance stabilizing transformation):


1) rlogTransformation(dds)

2) estimateSizeFactors(dds)

It cant be anything else since rlogTransformation(dds) only takes a DEseqdataset as input with integers. After estimateSizeFactors(dds) and obtaining the counts with counts() the integers are transformed to double. So there is no way I could use rlogTransformation(counts(dds,normalized=T)), right?


Thanks a lot!



For the sake of completeness- the figures are generated with the "vsn" package and the transformations with "DEseq2".

ADD COMMENTlink modified 15 months ago by minie0 • written 2.5 years ago by john40
gravatar for Michael Love
2.5 years ago by
Michael Love15k
United States
Michael Love15k wrote:

Can you send me via email the data and code which produced plot #5 and also your sessionInfo()? If the data is simulated, it could be something funny about the simulation for these rows which doesn't accord with the distributional assumptions. [Edit: see my other answer]

rlog() and varianceStabilizingTransformation() will either use pre-existing size factors or calculate size factors internally if they are not already calculated. Both of these functions only take raw counts, and size factors are used internally in the modeling. The whole point is to use the raw count values for modeling variance.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Michael Love15k

Dear Michael,

I did as you said and sent an email to your address. I accidentally included the library("reshape2") - you dont need that, just remove or ignore it.

The data is not simulated. Here is the session info:

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: i686-pc-linux-gnu (32-bit)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] gridExtra_0.9.1           reshape2_1.4.1           
 [3] vsn_3.34.0                Biobase_2.26.0           
 [5] ggplot2_1.0.1             DESeq2_1.6.3             
 [7] RcppArmadillo_0.4.650.1.1 Rcpp_0.11.5              
 [9] GenomicRanges_1.18.4      GenomeInfoDb_1.2.4       
[11] IRanges_2.0.1             S4Vectors_0.4.0          
[13] BiocGenerics_0.12.1      

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3       affy_1.44.0           affyio_1.34.0        
 [4] annotate_1.44.0       AnnotationDbi_1.28.2  base64enc_0.1-2      
 [7] BatchJobs_1.6         BBmisc_1.9            BiocInstaller_1.16.4
[10] BiocParallel_1.0.3    brew_1.0-6            checkmate_1.5.2      
[13] cluster_2.0.1         codetools_0.2-11      colorspace_1.2-6     
[16] DBI_0.3.1             digest_0.6.8          fail_1.2             
[19] foreach_1.4.2         foreign_0.8-63        Formula_1.2-0        
[22] genefilter_1.48.1     geneplotter_1.44.0    gtable_0.1.2         
[25] Hmisc_3.15-0          iterators_1.0.7       KernSmooth_2.23-14   
[28] lattice_0.20-31       latticeExtra_0.6-26   limma_3.22.7         
[31] locfit_1.5-9.1        MASS_7.3-40           munsell_0.4.2        
[34] nnet_7.3-9            plyr_1.8.1            preprocessCore_1.28.0
[37] proto_0.3-10          RColorBrewer_1.1-2    rpart_4.1-9          
[40] RSQLite_1.0.0         scales_0.2.4          sendmailR_1.2-1      
[43] splines_3.1.1         stringr_0.6.2         survival_2.38-1      
[46] tools_3.1.1           XML_3.98-1.1          xtable_1.7-4         
[49] XVector_0.6.0         zlibbioc_1.12.0

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by john40
gravatar for Michael Love
2.5 years ago by
Michael Love15k
United States
Michael Love15k wrote:

hi John

I took a look and then realized that those are just genes with all 0 counts in the original DESeqDataSet. 

In ?rlog we say:

To avoid returning matrices with NA                     
values, in the case of a row of all zeros, the rlog transformation                              
returns zeros (essentially adding a pseudocount of 1 only to these                              

These rows can be removed beforehand with:

dds <- dds[rowSums(counts(dds)) > 0, ]


ADD COMMENTlink written 2.5 years ago by Michael Love15k

Wow Michael you are fast- do you ever sleep?

Alright it was my assumption as well that the 0 genes are responsible.

Nice that we have clarified this!

ADD REPLYlink written 2.5 years ago by john40
gravatar for minie
15 months ago by
minie0 wrote:

Hello !

I am facing a problem while generating the plot , similar to what @John was doing.But the problem is I am getting an error with ylim i.e.

Error: Unknown parameters: ylim
Execution halted

I dont know what is going wrong were.

ADD COMMENTlink written 15 months ago by minie0

This question has been cross-posted. See A: Variance stabilizing transformation seems not working on NGS (bacterial 16srRNA

ADD REPLYlink written 15 months ago by Wolfgang Huber13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 223 users visited in the last hour