Entering edit mode
Short story: I ran a PCA on a matrix of counts from an RNA-seq
experiment and then using the voom-tramsformed data, and they're
completely different. I imagine my results will be very different
using DESeq2 vs limma, and I am wondering how to proceed. Here are the
two PCA biplots:
http://imgur.com/a/DBRfE
More background:
I have an RNA-seq experiment with 5 conditions: WT (n=4), and mutants
A, B, C, and D (all n=3). I aligned the reads with STAR, counted reads
mapping to genes using HTSeq-count. I imported the count data into
DESeq2 and processed using the functions described in the vignette,
DESeqDataSetFromHTSeqCount() and DESeq().
I performed a PCA on the transposed normalized counts table from the
DESeq Data Set (dds) object (note the outlier, which is no longer
outlying in the voom PCA plot below):
pca <- as.data.frame(prcomp(t(na.omit(counts(dds,
normalized=TRUE))))$x)
col <- 1:5
groups <- colData(dds)$condition
plot(PC2~PC1, data=pca, bg=col[groups], pch=21, main="DESeq2")
Here's that PCA biplot:
http://i.imgur.com/P5hhE9Q.png
I then used limma/voom to transform the data, and performed another
PCA, which looked completely different.
design <- model.matrix(~0+colData(dds)$condition)
v <- voom(counts=counts(dds, normalized=TRUE), design=design)
pca <- as.data.frame(prcomp(t(na.omit(v$E)))$x)
pca(PC2~PC1, data=pca, bp=col[groups], pch=21, main="voom")
Here's that PCA (notice the outlier C1 is no longer outlying):
http://i.imgur.com/oKlnWh3.png
Based on this PCA I imagine my results will look very different using
DESeq2 or limma on the voom'd data. I understand that this is due to
the PCA on the voom transformation is only considering the normalized
log2 expression values, and not considering the inverse variance
weights. Now I'm just not sure how to proceed with downstream analysis
(or quality control with the voom transformation, in general). Any
guidance is highly appreciated.
Thanks,
Stephen
...
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] limma_3.16.1 calibrate_1.7.1 MASS_7.3-26
[4] DESeq2_1.0.8 RcppArmadillo_0.3.800.1 Rcpp_0.10.3
[7] lattice_0.20-15 Biobase_2.20.0
GenomicRanges_1.12.2
[10] IRanges_1.18.0 BiocGenerics_0.6.0
BiocInstaller_1.10.0
loaded via a namespace (and not attached):
[1] annotate_1.38.0 AnnotationDbi_1.22.2 DBI_0.2-5
[4] genefilter_1.42.0 grid_3.0.0 locfit_1.5-9
[7] RColorBrewer_1.0-5 RSQLite_0.11.3 splines_3.0.0
[10] stats4_3.0.0 survival_2.37-4 tools_3.0.0
[13] XML_3.95-0.2 xtable_1.7-1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bioc-pca-deseq.png
Type: image/png
Size: 31145 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130423="" 99cf6269="" attachment.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bioc-pca-voom.png
Type: image/png
Size: 30142 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130423="" 99cf6269="" attachment-0001.png="">