I generated PCA plots for my RNASeq data (3 biological replicates per sample) using rld and vsd normalized count data (images below).
I then removed data for three samples that appeared like outliers on the PCA plots (marked in red circle) and also I know that those three samples have mutations in certain genes and other samples should be wild-type like. I regenerated the PCA plot with all samples except the three outliers and get very different PCA plots using rld and vsd normalized counts.
I am therefore wondering what could contribute to the differences? Also the blue dots and the red dots in the PCA represent samples processed as different experiments (batch effects). So, can I use design = ~ batch + condition to account for the batch effect. How should I re-plot PCA for the normalized counts after removing the batch effect ?