I am quite confused about PCA on the count data. RUVseq manual applies PCA on raw count data without variance stabilization as suggests by DESeq. Is there then a possibility that the PCA plots produced by RUVseq do not depict the reality as VST of DESeq accounts for sequencing depth and stabilizes the variance of small counts?
Can you be a bit more specific as to what part of the RUVSeq manual is applying PCA on raw count data?
It looks like RUVSeq uses EDASeq for a number of utility methods, including the plotPCA function. If you take a look at the source code for plotPCA, you'll see that it will by default log transform the counts prior to running the PCA.
Even if it log transforms the data, it still does not account for sequencing depth and does not apply variance stabilization.
page 3:
to display unnormalized data:
filtered are the raw counts.
set <- newSeqExpressionSet(as.matrix(filtered), phenoData = data.frame(x, row.names=colnames(filtered)))
plotPCA
(set, col=colors[x], cex=1.2)
page 6:
to display normalized data which accounts only for Batch effect (empirical control):
emprical is least significantly DE genes based on a first-pass DE analysis performed prior to RUVg normalization.
set2 <- RUVg(set, empirical, k=1)
plotRLE(set2, outline=FALSE, ylim=c(-4, 4), col=colors[x]) plotPCA
(set2, col=colors[x], cex=1.2)
So, it normalizes to a set of genes, but it never takes into account the sequencing depth (which is done by DESeq with sizeFactors), nor does it variance stabilization (what is also suggested by the DESeq in order to perform PCA)
What the package can do for you I think really depends on what functions you make use of. Re: sequencing depth, between lane normalization is covered on page 3 of the RUVseq vignette:
https://www.bioconductor.org/packages/3.3/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf
set <- betweenLaneNormalization(set, which="upper")