Question: RUVseq PCA on raw data, whereas DESeq suggests stabilization
gravatar for tonja.r
2.5 years ago by
United Kingdom
tonja.r40 wrote:

I am quite confused about PCA on the count data. RUVseq manual applies PCA on raw count data without variance stabilization as suggests by DESeq. Is there then a possibility that the PCA plots produced by RUVseq do not depict the reality as VST of DESeq accounts for sequencing depth and stabilizes the variance of small counts? 


ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by tonja.r40

Can you be a bit more specific as to what part of the RUVSeq manual is applying PCA on raw count data?

It looks like RUVSeq uses EDASeq for a number of utility methods, including the plotPCA function. If you take a look at the source code for plotPCA, you'll see that it will by default log transform the counts prior to running the PCA.

ADD REPLYlink written 2.5 years ago by Steve Lianoglou12k

Even if it log transforms the data, it still does not account for sequencing depth and does not apply variance stabilization.

page 3:
to display unnormalized data:
filtered are the raw counts.

set <- newSeqExpressionSet(as.matrix(filtered), phenoData = data.frame(x, row.names=colnames(filtered)))
(set, col=colors[x], cex=1.2)

page 6:
to display normalized data which accounts only for Batch effect (empirical control):

emprical is least significantly DE genes based on a first-pass DE analysis performed prior to RUVg normalization. 

set2 <- RUVg(set, empirical, k=1) 
plotRLE(set2, outline=FALSE, ylim=c(-4, 4), col=colors[x]) plotPCA
(set2, col=colors[x], cex=1.2) 

So, it normalizes to a set of genes, but it never takes into account the sequencing depth (which is done by DESeq with sizeFactors), nor does it variance stabilization (what is also suggested by the DESeq in order to perform PCA)

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by tonja.r40

What the package can do for you I think really depends on what functions you make use of.  Re: sequencing depth, between lane normalization is covered on page 3 of the RUVseq vignette:

set <- betweenLaneNormalization(set, which="upper")

ADD REPLYlink written 2.5 years ago by Joseph Bundy20
gravatar for davide risso
2.5 years ago by
davide risso680
Weill Cornell Medicine
davide risso680 wrote:

As Steve mentioned in the comment, the RUVSeq vignette uses the EDASeq plotPCA function, which by default log transforms the data, so we do not apply PCA on the raw counts, but on the log transformed counts. It is true that in principle a variance stabilizing transformation could help better visualizing the data with PCA, but we see that in practice the first PCs are very stable, independent of the log vs vst transformation.

As for the differences in sequencing depth, we use PCA to show that the data cluster "better" after normalization, assuming that we want to see the samples clustering by biological condition in the space of the first two principal components. We do PCA both on unnormalized (figure 1) and upper-quartile normalized data (figure 2), hence accounting for sequencing depth. 


ADD COMMENTlink written 2.5 years ago by davide risso680

Then one uses upper-quartile normalization, obtains unwanted factors, inputs them into the matrix and runs DESeq on raw counts and matrix using totally different normalization. It is still possible to normalize the counts with the DESeq (sizeFactors) and run RUVseq on those normalized counts and obtain unwanted factors and then run the DESeq analysis. So that the finding of unwanted factors and differential analysis is performed on the same normalization. For me it seems a bit strange to use different normalizations while finding unwanted factors and perform differential analysis.  

ADD REPLYlink written 2.5 years ago by tonja.r40

Yes, you're correct. It is not correct to compute the unwanted variation from upper-quartile normalized data and input them into DESeq (with DESeq size factors).

One can in principle compute the factors of unwanted variation (UV) from unnormalized data, but in practice it is better to use scaled counts (for instance using upper-quartile normalization). In edgeR, you can simply use the upper-quartile normalization both for computing the UV factors and the in the DE model.

If you want to use DESeq normalization and model, you can run DESeq normalization, retrieve the normalized counts with

counts(., normalized=TRUE)

and then run RUV on that. This will give you factors that you can impute in the DESeq model.

Note that this is not related to PCA or plotPCA, as RUV doesn't use either to compute the UV factors.

Adding an example on how to use RUVSeq with DESeq in the vignette is on my to do list for a while now. I will try to push for it.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by davide risso680
gravatar for Michael Love
2.5 years ago by
Michael Love17k
United States
Michael Love17k wrote:

re: The variance of raw counts grows with the mean. Those plots are not showing the variance-mean relationship of raw counts.

ADD COMMENTlink written 2.5 years ago by Michael Love17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 276 users visited in the last hour