Question

DESeq2 transformation to use for PCA plot

0

Entering edit mode

Shobana Sekar ▴ 20

@shobana-sekar-6409

Last seen 7.9 years ago

Phoenix, AZ

Hi,

I created a PCA plot for our RNAseq count dataset following the instructions in the vignette, using r log transformation. Though my plot got generated, I got this warning message when I called the rlog function:

Warning message:
In sparseTest(counts(object, normalized = TRUE), 0.9, 100, 0.1) :
the rlog assumes that data is close to a negative binomial distribution, an assumption
which is sometimes not compatible with datasets where many genes have many zero counts
despite a few very large counts.
In this data, for 15.9% of genes with a sum of normalized counts above 100, it was the case
that a single sample's normalized count made up more than 90% of the sum over all samples.
the threshold for this warning is 10% of genes. See plotSparsity(dds) for a visualization of this.
We recommend instead using the varianceStabilizingTransformation or shifted log (see vignette).

So if I understand this correctly, in genes with sum of normalized count > 100, there is a very large count value (from a single sample) that accounts for over 90% of the sum of normalized count value.

However, I am not sure if this matters while doing a PCA analysis? I tried doing the PCA with both rld and vsd transformed data and the plots look very different. So could you help me understand which method is preferred/recommended in this case?

I have RNAseq count data from HTSeq counts. There are 6 replicates each in the control and affected group and I am interested in looking at the differentially expressed genes between the 2 groups. I am doing the PCA as more of a quality assessment step, to see if there are any outlier samples in the set. For my heatmaps, I use the vsd transformed data.

Thanks!

Shobana

deseq2 pca rlog transformation variancestabilizingtransformation • 3.1k views

ADD COMMENT • link 7.9 years ago Shobana Sekar ▴ 20

score 2 · Answer 1 · 2016-05-12

"However, I am not sure if this matters while doing a PCA analysis?"

The point is that the rlog() is giving a warning saying not to use the rlog but to use the VST instead.

"So could you help me understand which method is preferred/recommended in this case?"

The point of the warning is to tell you not to use the rlog, but to use the VST or simply log2(count + 1), which can be performed with normTransform().

score 0 · Answer 2 · 2016-05-12

0