I have a data set of 51 samples over 4 different conditions, and I want to visualise the similarity between the groups. I have already identified a known blood contamination which affects 7 of the samples, and have added a column named "contamination", with the labels "yes" or "no".
However, when I include this term in the design matrix, it does not affect the appearance of the PCA plot. It looks the same as without the term, and the 7 samples are outliers in reference to the other samples of the same condition.
Code:
d.deseq <- DESeqDataSetFromMatrix(countData = raw_counts,
colData = sample_data,
design = ~ contamination + condition)
vsd <- vst(d.deseq, blind=FALSE)
pcaData <- plotPCA(vsd, intgroup=c("condition"), returnData = TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
p <- ggplot(pcaData, ...)
Thanks a lot for help with troubleshooting and/or other suggestion how to deal with the contamination.
Thank you, sorry I've missed that. Does this take the treatment condition into account when correcting the batch effects, or does the design have to be provided to the ´removeBatchEffect´ function? I've looked in to the limma documentation but they use other input objects to start with.
You would just provide the batch variable to that function, not the full design.