I am analyzing RNA-seq data using DESeq2 I have 5 groups of samples each with 4 replicates.
Below is a PCA plot of the VST transformed value: as shown in the PCA plot, replicate 1 (R1) are separate from the others on the upper half of the PCA plot. There are some clustering of the replicate 2 samples as well (triangles) at the bottom.
I ran deseq2 with a design of
~ batch + group
I tried to visualize the PCA plot after batch effect was modeled following this post, using:
log(counts(dds, normalized=TRUE) + 1) - log(t( t(assays(dds)[["mu"]]) / sizeFactors(dds) ) + 1)
I thought some hidden batch effect is still not corrected. So, I decided to run SVA to get two surrogate variables and include that variables into the model as
~ SV1 + SV2 + group
dat <- counts(dds, normalized = TRUE) idx <- rowMeans(dat) > 1 dat <- dat[idx, ] # specify full model which include the variable of interest mod <- model.matrix(~ group, data=colData(dds)) # null model is the model with only intercept term mod0 <- model.matrix(~ 1, data=colData(dds)) # coded this way, svseq is the 2 latent variables not associated with group svseq <- svaseq(dat, mod, mod0, n.sv = 2) SV1 = svseq[,1] SV2 = svseq[,2]
However, I think there is still no clear separations between the groups and some batch effects are still there.
Any advice would be much appreciated. Thank you in advance!