I am using svaseq and DESeq2 together to remove surrogate variables from an RNA-seq dataset and call DEG. I ran the raw counts through DESeq2 and rlog transformed the data for processing by svaseq.
dds <- DESeqDataSetFromMatrix(countData = counts, colData = colData, design = ~ condition)dds <- DESeq(dds)rld <- rlog(dds, blind = FALSE)counts_deseq <- assay(rld)pca = prcomp(t( counts_deseq ), scale=FALSE)

Notice that samples A1 and B1 are separated in PC2. B1_0 and B1_3 are separated in PC1, as I hoped.
This is my svaseq R code. counts_seq contains the rlog transformed data produced above.
mod <- model.matrix(~ condition, colData)mod0 <- model.matrix(~ 1, colData[, -1])svseq <- svaseq(counts_deseq, mod, mod0, n.sv=3)
I used the R function below to "clean" the original input and obtain corrected counts.
cleaningY = function(y, mod, svaobj) {
X = cbind(mod, svaobj$sv)
Hat = solve(t(X)%*%X)%*%t(X)
beta = (Hat%*%t(y))
P = ncol(mod)
cleany = y-t(as.matrix(X[,-c(1:P)])%*%beta[-c(1:P),])
return(cleany)
}
counts_sva = cleaningY(counts_deseq, mod, svseq)pca = prcomp(t( counts_sva ), scale=FALSE)
The cleaned counts were then used to generate a PCA plot of PC1 x PC2, which showed that the two A1 and B1 batches are now more similar than before sva.

Then I added SV1 and SV2 to my colData and generated a DESeq dds object following http://www.bioconductor.org/help/workflows/rnaseqGene/#batch
dds.sva <- dds dds.sva$SV1 <- svseq$sv[,1]dds.sva$SV2 <- svseq$sv[,2]design(ddssva) <- ~ SV1 + SV2 + conditionrld.sva <- rlog(dds.sva, blind = FALSE) counts_deseq_sva <- assay(rld.sva) pca = prcomp(t( counts_deseq_sva ), scale=FALSE)
My colData now looks like this:
sample condition SV1 SV2 sizeFactor
A1_0 control -0.648430548730741 -0.0538911129250703 0.536015875141358
B1_0_1 control 0.284400077531668 0.828886823588163 0.800354347640333
B1_0_2 control 0.229646673150629 -0.394550325516213 0.615815976577249
B1_0_3 control 0.158381936289815 -0.338562050495541 0.731675847902015
A1_3 treatment -0.537353868258029 0.144903580282808 1.43186332691591
B1_3_2 treatment 0.269242789169143 -0.118951595650863 1.70605551398322
B1_3_3 treatment 0.244112940847514 -0.0678353192832832 2.33518790975716
When I plotted a PCA with the new rlog data, I observed a slight change from the original rlog data. However, it differs from the svaseq corrected PCA and still shows separation between the batches.

I was expecting the PCA with the rlog transformed with SV1 and SV2 accounted for to look like the cleaned svaseq PCA.
Am I doing something wrong? Any help is much appreciated.
Thank you!

There seem to be several copy-and-paste errors in your code, such as the last block where you compute counts_deseq_sva and then compute prcomp on counts_sva.
Thanks for catching that, I fixed it.