I am analyzing a dataset (treatment vs. control) each with three replicates. In the PCA plot one of the replicate samples (control) is positioned away from the other two replicates but the treatment samples are nicely clustered. How do you deal this scenario? Is there any required modifications to DESEQ2?
There's no right answer for what to do. I might slightly favor keeping it in the analysis, because it clusters with the other 2 samples on PC1 which is the primary axis of variation. The question can only be answered by generating more control samples. Would the new control samples stay close to the 2 you have, indicating something might have gone wrong with that one sample, and you should remove it? Or would the control samples have a wide spread, indicating biological variation, indicating you should include it.
Thank you very much Michael, In this scenario, I don't have the option of re-sequencing the samples but trying to including the existing sample in the analysis and explain the reasons for the inclusion.
Luckily, my heatmap shows that all the control samples are clustered in the hierarchical clustering.
the simplest thing to do is just to remove the third control sample and proceed with the analysis: an outlier sample might increase the variability for a couple of genes, potentially leading to a higher dispersion estimate and thus less power to call DE genes.
If you want to dig deeper, you can produce MA plots / scatterplots of the outlier sample versus all the other samples, to see where exactly the differences are.
After all, the PCA only tells you that there are differences, not where they are.
As a third suggestion: compute the PCA manually (you can use the code of the plotPCA function) and
inspect the loadings (they are called "rotation" in the prcomp output):
Maybe some genes have very high loading, i.e. they contribute strongly to a certain PC score: if this is the case, they might the ones that are different from the other samples in your outlier.
hi Prasad,
Can you post a picture of the PCA plot? See the FAQ on how to share images.
Hi Michael, Thank you very much for the help. here is the link for the plot
http://i.imgur.com/q6fTtKp.png
The abnormal point I am talking about is located at the left bottom (purple square)
There's no right answer for what to do. I might slightly favor keeping it in the analysis, because it clusters with the other 2 samples on PC1 which is the primary axis of variation. The question can only be answered by generating more control samples. Would the new control samples stay close to the 2 you have, indicating something might have gone wrong with that one sample, and you should remove it? Or would the control samples have a wide spread, indicating biological variation, indicating you should include it.
Thank you very much Michael, In this scenario, I don't have the option of re-sequencing the samples but trying to including the existing sample in the analysis and explain the reasons for the inclusion.
Luckily, my heatmap shows that all the control samples are clustered in the hierarchical clustering.