Question

deseq2 - PCA plot

0

Entering edit mode

Prasad Siddavatam ▴ 150

@prasad-siddavatam-4508

Last seen 10.1 years ago

United States

Hi Michael,

I am analyzing a dataset (treatment vs. control) each with three replicates. In the PCA plot one of the replicate samples (control) is positioned away from the other two replicates but the treatment samples are nicely clustered. How do you deal this scenario? Is there any required modifications to DESEQ2?

Greatly appreciate your help.

Prasad

deseq2 • 5.2k views

ADD COMMENT • link updated 10.1 years ago by Bernd Klaus ▴ 610 • written 10.1 years ago by Prasad Siddavatam ▴ 150

0

Entering edit mode

hi Prasad,

Can you post a picture of the PCA plot? See the FAQ on how to share images.

ADD REPLY • link 10.1 years ago Michael Love 43k

0

Entering edit mode

Hi Michael, Thank you very much for the help. here is the link for the plot

http://i.imgur.com/q6fTtKp.png

ADD REPLY • link 10.1 years ago Prasad Siddavatam ▴ 150

0

Entering edit mode

The abnormal point I am talking about is located at the left bottom (purple square)

ADD REPLY • link 10.1 years ago Prasad Siddavatam ▴ 150

0

Entering edit mode

There's no right answer for what to do. I might slightly favor keeping it in the analysis, because it clusters with the other 2 samples on PC1 which is the primary axis of variation. The question can only be answered by generating more control samples. Would the new control samples stay close to the 2 you have, indicating something might have gone wrong with that one sample, and you should remove it? Or would the control samples have a wide spread, indicating biological variation, indicating you should include it.

ADD REPLY • link 10.1 years ago Michael Love 43k

0

Entering edit mode

Thank you very much Michael, In this scenario, I don't have the option of re-sequencing the samples but trying to including the existing sample in the analysis and explain the reasons for the inclusion.

Luckily, my heatmap shows that all the control samples are clustered in the hierarchical clustering.

ADD REPLY • link 10.1 years ago Prasad Siddavatam ▴ 150

score 1 · Answer 1 · 2014-12-16

Hi Prasad,

the simplest thing to do is just to remove the third control sample and proceed with the analysis: an outlier sample might increase the variability for a couple of genes, potentially leading to a higher dispersion estimate and thus less power to call DE genes.

If you want to dig deeper, you can produce MA plots / scatterplots of the outlier sample versus all the other samples, to see where exactly the differences are.

After all, the PCA only tells you that there are differences, not where they are.

As a third suggestion: compute the PCA manually (you can use the code of the plotPCA function) and

inspect the loadings (they are called "rotation" in the prcomp output):

http://www.rdocumentation.org/packages/stats/functions/prcomp

Maybe some genes have very high loading, i.e. they contribute strongly to a certain PC score: if this is the case, they might the ones that are different from the other samples in your outlier.

Best wishes,

Bernd