Question: DESeq2 transformation to use for PCA plot
0
gravatar for Shobana Sekar
2.9 years ago by
Phoenix, AZ
Shobana Sekar20 wrote:

Hi,

I created a PCA plot for our RNAseq count dataset following the instructions in the vignette, using r log transformation. Though my plot got generated, I got this warning message when I called the rlog function:

Warning message:
In sparseTest(counts(object, normalized = TRUE), 0.9, 100, 0.1) :
  the rlog assumes that data is close to a negative binomial distribution, an assumption
which is sometimes not compatible with datasets where many genes have many zero counts
despite a few very large counts.
In this data, for 15.9% of genes with a sum of normalized counts above 100, it was the case 
that a single sample's normalized count made up more than 90% of the sum over all samples.
the threshold for this warning is 10% of genes. See plotSparsity(dds) for a visualization of this.
We recommend instead using the varianceStabilizingTransformation or shifted log (see vignette).

So if I understand this correctly, in genes with sum of normalized count > 100,  there is a very large count value (from a single sample) that accounts for over 90% of the sum of normalized count value. 

However, I am not sure if this matters while doing a PCA analysis? I tried doing the PCA with both rld and vsd transformed data and the plots look very different. So could you help me understand which method is preferred/recommended in this case?

I have RNAseq count data from HTSeq counts. There are 6 replicates each in the control and affected group and I am interested in looking at the differentially expressed genes between the 2 groups. I am doing the PCA as more of a quality assessment step, to see if there are any outlier samples in the set. For my heatmaps, I use the vsd transformed data.

Thanks!

Shobana

 

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Shobana Sekar20
Answer: DESeq2 transformation to use for PCA plot
2
gravatar for Michael Love
2.9 years ago by
Michael Love22k
United States
Michael Love22k wrote:

"However, I am not sure if this matters while doing a PCA analysis?"

The point is that the rlog() is giving a warning saying not to use the rlog but to use the VST instead.

"So could you help me understand which method is preferred/recommended in this case?"

The point of the warning is to tell you not to use the rlog, but to use the VST or simply log2(count + 1), which can be performed with normTransform().

 

ADD COMMENTlink written 2.9 years ago by Michael Love22k
Answer: DESeq2 transformation to use for PCA plot
0
gravatar for Shobana Sekar
2.9 years ago by
Phoenix, AZ
Shobana Sekar20 wrote:

Okay, thanks! I'll use the vst then for my PCA plots. 

Thank you!

ADD COMMENTlink written 2.9 years ago by Shobana Sekar20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 313 users visited in the last hour