deseq2 - PCA plot
1
0
Entering edit mode
@prasad-siddavatam-4508
Last seen 9.4 years ago
United States

Hi Michael,

I am analyzing a dataset (treatment vs. control) each with three replicates. In the PCA plot one of the replicate samples (control) is positioned away from the other two replicates but the treatment samples are nicely clustered. How do you deal this scenario? Is there any required modifications to DESEQ2? 

Greatly appreciate your help.

Prasad

deseq2 • 4.9k views
ADD COMMENT
0
Entering edit mode

hi Prasad, 

Can you post a picture of the PCA plot? See the FAQ on how to share images.

ADD REPLY
0
Entering edit mode

Hi Michael, Thank you very much for the help. here is the link for the plot

http://i.imgur.com/q6fTtKp.png

 

ADD REPLY
0
Entering edit mode

The abnormal point I am talking about is located at the left bottom (purple square)

 

ADD REPLY
0
Entering edit mode

There's no right answer for what to do. I might slightly favor keeping it in the analysis, because it clusters with the other 2 samples on PC1 which is the primary axis of variation. The question can only be answered by generating more control samples. Would the new control samples stay close to the 2 you have, indicating something might have gone wrong with that one sample, and you should remove it? Or would the control samples have a wide spread, indicating biological variation, indicating you should include it.

ADD REPLY
0
Entering edit mode

Thank you very much Michael, In this scenario, I don't have the option of re-sequencing the samples but trying to including the existing sample in the analysis and explain the reasons for the inclusion. 

Luckily, my heatmap shows that all the control samples are clustered in the hierarchical clustering.

ADD REPLY
1
Entering edit mode
Bernd Klaus ▴ 610
@bernd-klaus-6281
Last seen 5.5 years ago
Germany

Hi Prasad,

the simplest thing to do is just to remove the third control sample and proceed with the analysis: an outlier sample might increase the variability for a couple of genes, potentially leading to a higher dispersion estimate and thus less power to call DE genes.

If you want to dig deeper, you can produce MA plots / scatterplots of the outlier sample versus all the other samples, to see where exactly the differences are.

After all, the PCA only tells you that there are differences, not where they are.

As a third suggestion: compute the PCA manually (you can use the code of the plotPCA function) and

inspect the loadings (they are called "rotation" in the prcomp output):

http://www.rdocumentation.org/packages/stats/functions/prcomp

Maybe some genes have very high loading, i.e. they contribute strongly to a certain PC score: if this is the case, they might the ones that are different from the other samples in your outlier.

Best wishes,

 

Bernd

ADD COMMENT
0
Entering edit mode

Thank you very much Bernd, for your detailed explanation. 

ADD REPLY

Login before adding your answer.

Traffic: 607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6