Question: Batch-effect: batch in design and corrected batch with sva- DESeq2
Hi All,

I did DESeq2 on my set of samples and I got only two DE genes. When I checked I found that my samples were collected at different time points and in PCA a prominent batch effect is shown. So, I included a batch in my design

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                   directory = directory,
                                   design= ~ batch + condition)

where batch is batch [1] 1 2 3 1 2 3 Levels: 1 2 3

I get 15 DE genes after including design. So far all good.

But then I read another post where we can adjust the batch effect using 'SVA' and I tried it also. See below:

       dat  <- counts(dds, normalized = TRUE)
        idx  <- rowMeans(dat) > 1
        dat  <- dat[idx, ]
        mod  <- model.matrix(~ condition, colData(dds))
        mod0 <- model.matrix(~   1, colData(dds)) <-,mod, method = 'leek')

     svseq <- svaseq(dat, mod, mod0, = 3)

    ddssva <- dds
    ddssva$SV1 <- svseq$sv[,1]
    ddssva$SV2 <- svseq$sv[,2]
    ddssva$SV3 <- svseq$sv[,3]
    design(ddssva) <- ~ SV1 + SV2 + SV3 + condition

 ddssva <- DESeq(ddssva) 
 resSV <- results(ddssva) was 4 but while using 4 it throws the below error so I used 3 and it worked fine

 svseq <- svaseq(dat, mod, mod0, = 4)
    Number of significant surrogate variables is:  4 
    Iteration (out of 5 ):Error in density.default(x, adjust = adj) : 'x' contains missing values
    In addition: Warning message:
    In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced

Using this method (sva) I am getting 51 DE genes and only 5 genes are common between this and if using batch in design. My question is that why there is a difference between the results of the two methods and which method should I use form my downstream analysis.

Any help would be appreciated! Thanks!

Answer: Batch-effect: batch in design and corrected batch with sva- DESeq2
Michael Love21k
United States
With known batches, I tend to favor ~batch + condition. I'm not sure why you would get such a small overlap.

Thanks, Michael for the answer.

I was not sure why I am getting such different numbers. I would go ahead with ~ batch + condition.

Also when I use = 2 and use only two SVs I only get 4 DE genes.

Is it recommended to use high or low when running svaseq. Any suggestion?

I’ll let the SVA devels handle that one.

