DESeq2 transformation to use for PCA plot
2
0
Entering edit mode
@shobana-sekar-6409
Last seen 5.4 years ago
Phoenix, AZ

Hi,

I created a PCA plot for our RNAseq count dataset following the instructions in the vignette, using r log transformation. Though my plot got generated, I got this warning message when I called the rlog function:

Warning message:
In sparseTest(counts(object, normalized = TRUE), 0.9, 100, 0.1) :
  the rlog assumes that data is close to a negative binomial distribution, an assumption
which is sometimes not compatible with datasets where many genes have many zero counts
despite a few very large counts.
In this data, for 15.9% of genes with a sum of normalized counts above 100, it was the case 
that a single sample's normalized count made up more than 90% of the sum over all samples.
the threshold for this warning is 10% of genes. See plotSparsity(dds) for a visualization of this.
We recommend instead using the varianceStabilizingTransformation or shifted log (see vignette).

So if I understand this correctly, in genes with sum of normalized count > 100,  there is a very large count value (from a single sample) that accounts for over 90% of the sum of normalized count value. 

However, I am not sure if this matters while doing a PCA analysis? I tried doing the PCA with both rld and vsd transformed data and the plots look very different. So could you help me understand which method is preferred/recommended in this case?

I have RNAseq count data from HTSeq counts. There are 6 replicates each in the control and affected group and I am interested in looking at the differentially expressed genes between the 2 groups. I am doing the PCA as more of a quality assessment step, to see if there are any outlier samples in the set. For my heatmaps, I use the vsd transformed data.

Thanks!

Shobana

 

deseq2 pca rlog transformation variancestabilizingtransformation • 1.9k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

"However, I am not sure if this matters while doing a PCA analysis?"

The point is that the rlog() is giving a warning saying not to use the rlog but to use the VST instead.

"So could you help me understand which method is preferred/recommended in this case?"

The point of the warning is to tell you not to use the rlog, but to use the VST or simply log2(count + 1), which can be performed with normTransform().

 

ADD COMMENT
0
Entering edit mode
@shobana-sekar-6409
Last seen 5.4 years ago
Phoenix, AZ

Okay, thanks! I'll use the vst then for my PCA plots. 

Thank you!

ADD COMMENT

Login before adding your answer.

Traffic: 359 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6