Question

DESeq2 rlog error

2

Entering edit mode

bruce.moran ▴ 30

@brucemoran-8388

Last seen 2.5 years ago

Ireland

Hi,

I have been using DESeq2 for a while, it is a good tool, never had any issues. Now I am getting an error at rlog() using 'fast' which is essential AFAIAC:

> rldss<-rlog(ddss, fast=T)

Error in rlog(ddss, fast = T) : unused argument (fast = T)

I do note that the option is removed from the documentation. If this is the case can anyone specify why? And are there other quick ways to do this transform? It is purely to plot PCA.

Appreciate any help,

Bruce.

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_IE.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_IE.UTF-8        LC_COLLATE=en_IE.UTF-8
 [5] LC_MONETARY=en_IE.UTF-8    LC_MESSAGES=en_IE.UTF-8
 [7] LC_PAPER=en_IE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] ggplot2_1.0.1              genefilter_1.52.0
 [3] DESeq2_1.10.0              RcppArmadillo_0.6.100.0.0
 [5] Rcpp_0.12.1                SummarizedExperiment_1.0.0
 [7] Biobase_2.30.0             GenomicRanges_1.22.0
 [9] GenomeInfoDb_1.6.0         IRanges_2.4.1
[11] S4Vectors_0.8.0            BiocGenerics_0.16.0
[13] BiocInstaller_1.20.0

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3
 [4] XVector_0.10.0       futile.options_1.0.0 tools_3.2.2
 [7] zlibbioc_1.16.0      rpart_4.1-10         digest_0.6.8
[10] RSQLite_1.0.0        annotate_1.48.0      gtable_0.1.2
[13] lattice_0.20-33      DBI_0.3.1            proto_0.3-10
[16] gridExtra_2.0.0      cluster_2.0.3        stringr_1.0.0
[19] locfit_1.5-9.1       nnet_7.3-11          grid_3.2.2
[22] AnnotationDbi_1.32.0 XML_3.98-1.3         survival_2.38-3
[25] BiocParallel_1.4.0   foreign_0.8-66       latticeExtra_0.6-26
[28] Formula_1.2-1        geneplotter_1.48.0   reshape2_1.4.1
[31] lambda.r_1.1.7       magrittr_1.5         scales_0.3.0
[34] Hmisc_3.17-0         MASS_7.3-44          splines_3.2.2
[37] xtable_1.7-4         colorspace_1.2-6     stringi_1.0-1
[40] acepack_1.3-3.3      munsell_0.4.2

DESeq2 • 2.8k views

ADD COMMENT • link updated 8.5 years ago by Michael Love 41k • written 8.5 years ago by bruce.moran ▴ 30

score 5 · Accepted Answer · 2015-10-23

5

Entering edit mode

Michael Love 41k

@mikelove

Last seen 8 hours ago

United States

hi Bruce,

After exploring ways to speed up the rlog, I came to prefer if users use the varianceStabilizingTransformation for datasets with many (e.g. 100s) of samples. The rlog has nice properties that we show in the paper, but it does require fitting a parameter for each sample. The 'fast' rlog was an approximation I was working on, but I came to think that the VST is preferable. The VST is just applying a function to the matrix of counts, so it's even faster. The only bottleneck with VST is estimating the dispersion trend (which the rlog also required). If you've already estimated dispersion (after DESeq() for example), you can use:

vsd <- varianceStabilizingTransformation(dds, blind=FALSE)
plotPCA(vsd)

which should take less than a second to return.

I have a fast routine for estimating the dispersion trend, which I ~~will probably incorporate into a function at some point~~ have now added to the devel branch as a function called vst() as of February 2016.

ADD COMMENT • link 8.5 years ago • updated 8.2 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

many thanks for the answer, I will change scripts to reflect.

Bruce.

ADD REPLY • link 8.5 years ago bruce.moran ▴ 30

0

Entering edit mode

If the objective is to just plot a PCA, why would you specify blind=FALSE?

ADD REPLY • link 8.2 years ago enricoferrero ▴ 660

1

Entering edit mode

There is some discussion of this in the vignette, but basically, if there are many large differences across conditions, then blind=TRUE (the default) "sees" this as variability and will perhaps "over-transform" the data to temper this dispersion. I'm speaking very loosely here, but that's the idea. Specifying blind=FALSE, the transformations will only consider the within-condition variability, and so will result in a transformation which is closer to log2. For more comparison, check out the transformation section of the vignette. And for a very fast PCA plot you can always try normTransform(), which just corrects for library size, adds a pseudocount and log transforms. Until I write up the fast routine for VST, this is definitely the fastest way to produce transformed data.

ADD REPLY • link 8.2 years ago Michael Love 41k