I have a data set of 51 samples over 4 different conditions, and I want to visualise the similarity between the groups. I have already identified a known blood contamination which affects 7 of the samples, and have added a column named "contamination", with the labels "yes" or "no".
However, when I include this term in the design matrix, it does not affect the appearance of the PCA plot. It looks the same as without the term, and the 7 samples are outliers in reference to the other samples of the same condition.
d.deseq <- DESeqDataSetFromMatrix(countData = raw_counts, colData = sample_data, design = ~ contamination + condition) vsd <- vst(d.deseq, blind=FALSE) pcaData <- plotPCA(vsd, intgroup=c("condition"), returnData = TRUE) percentVar <- round(100 * attr(pcaData, "percentVar")) p <- ggplot(pcaData, ...)
Thanks a lot for help with troubleshooting and/or other suggestion how to deal with the contamination.