To the developers,
I'm a novice R user and new to the expression profiling analysis as well.
I am trying to do a differential expression analysis on a 2x2 factorial experiment (2 drought contrasting genotypes and 2 contrasting conditions) with 4 replicates per sample to have a 16 experimental units.
Before proceeding to run the differential expression, I initially did sample clustering after normalization using rlog. However, when I did the heatmap and especially the PCA with all the samples, one sample is really far from the rest with PC1 having 41% of the variance and PC2 with 25% variance. I was able to track that sample. When I removed it the heatmap and PCA improved such that the PC1 now has 76% variance and the PC2 with 15% variance.
Is it technically acceptable to remove one sample? Because reading from blogs and different resources on the DESEQ2 community, it seems that people recommend having all the samples together.
Please advise.