Question

RUVseq PCA on raw data, whereas DESeq suggests stabilization

1

Entering edit mode

tonja.r ▴ 80

@tonjar-7565

Last seen 8.1 years ago

United Kingdom

I am quite confused about PCA on the count data. RUVseq manual applies PCA on raw count data without variance stabilization as suggests by DESeq. Is there then a possibility that the PCA plots produced by RUVseq do not depict the reality as VST of DESeq accounts for sequencing depth and stabilizes the variance of small counts?

ruvseq deseq • 4.4k views

ADD COMMENT • link 9.0 years ago tonja.r ▴ 80

0

Entering edit mode

Can you be a bit more specific as to what part of the RUVSeq manual is applying PCA on raw count data?

It looks like RUVSeq uses EDASeq for a number of utility methods, including the plotPCA function. If you take a look at the source code for plotPCA, you'll see that it will by default log transform the counts prior to running the PCA.

ADD REPLY • link 9.0 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Even if it log transforms the data, it still does not account for sequencing depth and does not apply variance stabilization.

page 3:
to display unnormalized data:
filtered are the raw counts.

set <- newSeqExpressionSet(as.matrix(filtered), phenoData = data.frame(x, row.names=colnames(filtered))) plotPCA(set, col=colors[x], cex=1.2)

page 6:
to display normalized data which accounts only for Batch effect (empirical control):

emprical is least significantly DE genes based on a first-pass DE analysis performed prior to RUVg normalization.

set2 <- RUVg(set, empirical, k=1) plotRLE(set2, outline=FALSE, ylim=c(-4, 4), col=colors[x]) plotPCA(set2, col=colors[x], cex=1.2)

So, it normalizes to a set of genes, but it never takes into account the sequencing depth (which is done by DESeq with sizeFactors), nor does it variance stabilization (what is also suggested by the DESeq in order to perform PCA)

ADD REPLY • link 9.0 years ago tonja.r ▴ 80

2

Entering edit mode

What the package can do for you I think really depends on what functions you make use of. Re: sequencing depth, between lane normalization is covered on page 3 of the RUVseq vignette:
https://www.bioconductor.org/packages/3.3/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf

set <- betweenLaneNormalization(set, which="upper")

ADD REPLY • link 9.0 years ago Joseph Bundy ▴ 20

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 14 hours ago

United States

re: The variance of raw counts grows with the mean. Those plots are not showing the variance-mean relationship of raw counts.

ADD COMMENT • link 9.0 years ago Michael Love 42k

score 3 · Accepted Answer · 2015-11-23

3

Entering edit mode

davide risso ▴ 980

@davide-risso-5075

Last seen 8 months ago

University of Padova

As Steve mentioned in the comment, the RUVSeq vignette uses the EDASeq plotPCA function, which by default log transforms the data, so we do not apply PCA on the raw counts, but on the log transformed counts. It is true that in principle a variance stabilizing transformation could help better visualizing the data with PCA, but we see that in practice the first PCs are very stable, independent of the log vs vst transformation.

As for the differences in sequencing depth, we use PCA to show that the data cluster "better" after normalization, assuming that we want to see the samples clustering by biological condition in the space of the first two principal components. We do PCA both on unnormalized (figure 1) and upper-quartile normalized data (figure 2), hence accounting for sequencing depth.

ADD COMMENT • link 9.0 years ago davide risso ▴ 980

0

Entering edit mode

Then one uses upper-quartile normalization, obtains unwanted factors, inputs them into the matrix and runs DESeq on raw counts and matrix using totally different normalization. It is still possible to normalize the counts with the DESeq (sizeFactors) and run RUVseq on those normalized counts and obtain unwanted factors and then run the DESeq analysis. So that the finding of unwanted factors and differential analysis is performed on the same normalization. For me it seems a bit strange to use different normalizations while finding unwanted factors and perform differential analysis.

ADD REPLY • link 9.0 years ago tonja.r ▴ 80

0

Entering edit mode

Yes, you're correct. It is not correct to compute the unwanted variation from upper-quartile normalized data and input them into DESeq (with DESeq size factors).

One can in principle compute the factors of unwanted variation (UV) from unnormalized data, but in practice it is better to use scaled counts (for instance using upper-quartile normalization). In edgeR, you can simply use the upper-quartile normalization both for computing the UV factors and the in the DE model.

If you want to use DESeq normalization and model, you can run DESeq normalization, retrieve the normalized counts with

counts(., normalized=TRUE)

and then run RUV on that. This will give you factors that you can impute in the DESeq model.

Note that this is not related to PCA or plotPCA, as RUV doesn't use either to compute the UV factors.

Adding an example on how to use RUVSeq with DESeq in the vignette is on my to do list for a while now. I will try to push for it.

ADD REPLY • link 9.0 years ago davide risso ▴ 980