I'm running an experiment with two conditions, one is the wildtype and the other is a knock out of a gene. Each sample is from a single mouse and the samples were isolated on different days. Here is an overview
Next, I checked for Sex and Date and Sex + Date as covariates and ran a PCA as well. I used removeBatchEffect from Limma and the transformed counts from DESeq2 and the PCA looked like this:
As one can see, if I correct for Sex PC2 is showing the difference in my condition of interest but the correction for date or date+sex brings this to the PC1.
I also ran sva with the following model:
mod <- model.matrix(~ GT, colData(ddsrun)) mod0 <- model.matrix(~ 1, colData(ddsrun)) svseq <- svaseq(dat, mod, mod0, n.sv = 2)
and the result is
As a side question, the first column cleary show the effect for the isolation date but the second column doesn't correlate with Sex or Date. So I would not use this. Is there any good way how to interpret this? Right now, I would only try to overlap it with already know and likely covariate and not use it! Otherwise I introduce a batch which I don't know what it means???
With this, I would think that the sex doesn't have such a big impact on the data, but the date has. That means I will use ~ Date + GT as design formula for DESeq. Additionaly, I did an mds plot of the euclidean distance and it suggest that the data has an higher impact on the data compared to sex too.
Now I was wondering if these steps are in the correct order and my conclusion is correct? I have another experiment with the same set-up but here the GT effect is less and overlayed by Data and/or Sex