Question

sample removal after clustering

0

Entering edit mode

tarun2 • 0

@tarun2-11885

Last seen 4.2 years ago

United States

To the developers,

I'm a novice R user and new to the expression profiling analysis as well.

I am trying to do a differential expression analysis on a 2x2 factorial experiment (2 drought contrasting genotypes and 2 contrasting conditions) with 4 replicates per sample to have a 16 experimental units.

Before proceeding to run the differential expression, I initially did sample clustering after normalization using rlog. However, when I did the heatmap and especially the PCA with all the samples, one sample is really far from the rest with PC1 having 41% of the variance and PC2 with 25% variance. I was able to track that sample. When I removed it the heatmap and PCA improved such that the PC1 now has 76% variance and the PC2 with 15% variance.

Is it technically acceptable to remove one sample? Because reading from blogs and different resources on the DESEQ2 community, it seems that people recommend having all the samples together.

Please advise.

deseq2 rlog transformation • 2.4k views

ADD COMMENT • link updated 8.0 years ago by Michael Love 43k • written 8.0 years ago by tarun2 • 0

score 1 · Answer 1 · 2018-01-18

Yes, it's a good idea to remove an outlier if you think the experiment failed in a way. It could have been a problem in the experiment, in the library preparation, or in the sequencing. I would use FASTQC to check the sequencing quality, as you may be able to avoid the problem in the future if it was library preparation or sequencing. You can perhaps contact a local bioinformatics core for helping to understand what may have gone wrong, or if everything looks correct, it could have been a problem in the experiment itself.

But for DESeq2 analysis, I would recommend to remove the sample.

The advice to keep all the samples together is about a separate topic: whether to include all conditions when computing dispersion values.