Search
Question: RUVseq PCA on raw data, whereas DESeq suggests stabilization
1
2.7 years ago by
tonja.r40
United Kingdom
tonja.r40 wrote:

I am quite confused about PCA on the count data. RUVseq manual applies PCA on raw count data without variance stabilization as suggests by DESeq. Is there then a possibility that the PCA plots produced by RUVseq do not depict the reality as VST of DESeq accounts for sequencing depth and stabilizes the variance of small counts?

modified 2.7 years ago • written 2.7 years ago by tonja.r40

Can you be a bit more specific as to what part of the RUVSeq manual is applying PCA on raw count data?

It looks like RUVSeq uses EDASeq for a number of utility methods, including the plotPCA function. If you take a look at the source code for plotPCA, you'll see that it will by default log transform the counts prior to running the PCA.

Even if it log transforms the data, it still does not account for sequencing depth and does not apply variance stabilization.

page 3:
to display unnormalized data:
filtered are the raw counts.

set <- newSeqExpressionSet(as.matrix(filtered), phenoData = data.frame(x, row.names=colnames(filtered))) plotPCA(set, col=colors[x], cex=1.2)

page 6:
to display normalized data which accounts only for Batch effect (empirical control):

emprical is least significantly DE genes based on a first-pass DE analysis performed prior to RUVg normalization.

set2 <- RUVg(set, empirical, k=1)  plotRLE(set2, outline=FALSE, ylim=c(-4, 4), col=colors[x]) plotPCA(set2, col=colors[x], cex=1.2)

So, it normalizes to a set of genes, but it never takes into account the sequencing depth (which is done by DESeq with sizeFactors), nor does it variance stabilization (what is also suggested by the DESeq in order to perform PCA)

2

What the package can do for you I think really depends on what functions you make use of.  Re: sequencing depth, between lane normalization is covered on page 3 of the RUVseq vignette:
https://www.bioconductor.org/packages/3.3/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf

set <- betweenLaneNormalization(set, which="upper")

3
2.7 years ago by
davide risso740
Weill Cornell Medicine
davide risso740 wrote:

As Steve mentioned in the comment, the RUVSeq vignette uses the EDASeq plotPCA function, which by default log transforms the data, so we do not apply PCA on the raw counts, but on the log transformed counts. It is true that in principle a variance stabilizing transformation could help better visualizing the data with PCA, but we see that in practice the first PCs are very stable, independent of the log vs vst transformation.

As for the differences in sequencing depth, we use PCA to show that the data cluster "better" after normalization, assuming that we want to see the samples clustering by biological condition in the space of the first two principal components. We do PCA both on unnormalized (figure 1) and upper-quartile normalized data (figure 2), hence accounting for sequencing depth.

Then one uses upper-quartile normalization, obtains unwanted factors, inputs them into the matrix and runs DESeq on raw counts and matrix using totally different normalization. It is still possible to normalize the counts with the DESeq (sizeFactors) and run RUVseq on those normalized counts and obtain unwanted factors and then run the DESeq analysis. So that the finding of unwanted factors and differential analysis is performed on the same normalization. For me it seems a bit strange to use different normalizations while finding unwanted factors and perform differential analysis.

Yes, you're correct. It is not correct to compute the unwanted variation from upper-quartile normalized data and input them into DESeq (with DESeq size factors).

One can in principle compute the factors of unwanted variation (UV) from unnormalized data, but in practice it is better to use scaled counts (for instance using upper-quartile normalization). In edgeR, you can simply use the upper-quartile normalization both for computing the UV factors and the in the DE model.

If you want to use DESeq normalization and model, you can run DESeq normalization, retrieve the normalized counts with

counts(., normalized=TRUE)

and then run RUV on that. This will give you factors that you can impute in the DESeq model.

Note that this is not related to PCA or plotPCA, as RUV doesn't use either to compute the UV factors.

Adding an example on how to use RUVSeq with DESeq in the vignette is on my to do list for a while now. I will try to push for it.

0
2.7 years ago by
Michael Love19k
United States
Michael Love19k wrote:

re: The variance of raw counts grows with the mean. Those plots are not showing the variance-mean relationship of raw counts.