I am using svaseq and DESeq2 together to remove surrogate variables from an RNA-seq dataset and call DEG. I ran the raw counts through DESeq2 and rlog transformed the data for processing by svaseq.

`dds <- DESeqDataSetFromMatrix(countData = counts, colData = colData, design = ~ condition)`

`dds <- DESeq(dds)`

`rld <- rlog(dds, blind = FALSE)`

`counts_deseq <- assay(rld)`

`pca = prcomp(t( counts_deseq ), scale=FALSE)`

Notice that samples A1 and B1 are separated in PC2. B1_0 and B1_3 are separated in PC1, as I hoped.

This is my svaseq R code. counts_seq contains the rlog transformed data produced above.

`mod <- model.matrix(~ condition, colData)`

`mod0 <- model.matrix(~ 1, colData[, -1])`

`svseq <- svaseq(counts_deseq, mod, mod0, n.sv=3)`

I used the R function below to "clean" the original input and obtain corrected counts.

cleaningY = function(y, mod, svaobj) { X = cbind(mod, svaobj$sv) Hat = solve(t(X)%*%X)%*%t(X) beta = (Hat%*%t(y)) P = ncol(mod) cleany = y-t(as.matrix(X[,-c(1:P)])%*%beta[-c(1:P),]) return(cleany) }

`counts_sva = cleaningY(counts_deseq, mod, svseq)`

`pca = prcomp(t( counts_sva ), scale=FALSE)`

The cleaned counts were then used to generate a PCA plot of PC1 x PC2, which showed that the two A1 and B1 batches are now more similar than before sva.

Then I added SV1 and SV2 to my colData and generated a DESeq dds object following http://www.bioconductor.org/help/workflows/rnaseqGene/#batch

`dds.sva <- dds dds.sva$SV1 <- svseq$sv[,1]`

`dds.sva$SV2 <- svseq$sv[,2]`

`design(ddssva) <- ~ SV1 + SV2 + condition`

rld.sva <- rlog(dds.sva, blind = FALSE) counts_deseq_sva <- assay(rld.sva) pca = prcomp(t( counts_deseq_sva ), scale=FALSE)

My colData now looks like this:

`sample condition SV1 SV2 sizeFactor`

A1_0 control -0.648430548730741 -0.0538911129250703 0.536015875141358

B1_0_1 control 0.284400077531668 0.828886823588163 0.800354347640333

B1_0_2 control 0.229646673150629 -0.394550325516213 0.615815976577249

B1_0_3 control 0.158381936289815 -0.338562050495541 0.731675847902015

A1_3 treatment -0.537353868258029 0.144903580282808 1.43186332691591

B1_3_2 treatment 0.269242789169143 -0.118951595650863 1.70605551398322

B1_3_3 treatment 0.244112940847514 -0.0678353192832832 2.33518790975716

When I plotted a PCA with the new rlog data, I observed a slight change from the original rlog data. However, it differs from the svaseq corrected PCA and still shows separation between the batches.

I was expecting the PCA with the rlog transformed with SV1 and SV2 accounted for to look like the cleaned svaseq PCA.

Am I doing something wrong? Any help is much appreciated.

Thank you!

There seem to be several copy-and-paste errors in your code, such as the last block where you compute counts_deseq_sva and then compute prcomp on counts_sva.

Thanks for catching that, I fixed it.