Question: DESeq2 indirect batch effect removal
0
7 weeks ago by
University Hospital Jena, Germany
Nicolas Huber0 wrote:

Hi, I use DESeq2 in R for differential expression calculation. I already know the design is terrible, but this is what I have to work with ;) Question 1: can I remove the batch effect between 2 groups if their samples do not share a batch, but some of their samples share a batch with a third group? (groups are different genotypes, batches are different experiments) Example: I want to compare group gA (samples s1, s2, s3) with group gC (samples s8, s9, s10, s11). These samples come from different batches (batch bX and bY), but no sample of group gA shares a batch with samples from group gC, so a batch effect removal would be impossible. However, there is group gB (samples s4, s5, s6, s7), which has 2 samples from batch bX and 2 from batch bY.

myColData <- data.frame(row.names=c("s1","s2","s3","s4","s5","s6","s7","s8","s9","s10","s11"),
group=c("gA","gA","gA","gB","gB","gB","gB","gC","gC","gC", "gC"),
batch=c("bX","bX","bX","bX","bX","bY","bY","bY","bY","bY", "bY"))
print(myColData)

myCounts <- matrix(round(runif(1100, 0, 20)), ncol=11, nrow=100)


My approach would be to 1) initiate a DESeq object, with known batch effect in the design 2) apply the DESeq function over it 3) calculate the results of that with only gC and gA given in the contrast

library(DESeq2)
myDE1 <- DESeqDataSetFromMatrix(myCounts, colData=myColData, design=~batch+group)
myDE2 <- DESeq(myDE1)
myDE3 <- results(myDE2, contrast=c("group", "gC", "gA"))


My 1st question: Is this a correct way to handle the batch effect? My 2nd question: Let's assume there is no batch here, just the groups. DESeq2 will give different results when I give as input (step 1) the whole matrix with all samples and then compare (in step 3) only the groups I care about (as done above, Input was gA, gB and gC but I am only interested in gC vs. gA), compared with when I give a matrix with only the groups of Interest to begin with. Is one of the ways incorrect? Shouldn't the results (myDE03 & myDE13) theoretically be the same?

#input whole matrix, compare only gC and gA in the end
myDE01 <- DESeqDataSetFromMatrix(myCounts, colData=myColData, design=~group)
myDE02 <- DESeq(myDE01)
myDE03 <- results(myDE02, contrast=c("group", "gC", "gA"))
myDE03

#input only samples from gC and gA and compare them
myDE11 <- DESeqDataSetFromMatrix(myCounts[,c(1:3,8:11)], colData=myColData[c(1:3,8:11),], design=~group)
myDE12 <- DESeq(myDE11)
myDE13 <- results(myDE12, contrast=c("group", "gC", "gA"))
myDE13


Any help is appreciated

deseq2 batch • 94 views
modified 6 weeks ago by Michael Love26k • written 7 weeks ago by Nicolas Huber0
Answer: DESeq2 indirect batch effect removal
2
6 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:
> myColData
group batch
s1     gA    bX
s2     gA    bX
s3     gA    bX
s4     gB    bX
s5     gB    bX
s6     gB    bY
s7     gB    bY
s8     gC    bY
s9     gC    bY
s10    gC    bY
s11    gC    bY


DESeq2 will give different results when I give as input (step 1) the whole matrix with all samples and then compare (in step 3) only the groups I care about (as done above, Input was gA, gB and gC but I am only interested in gC vs. gA), compared with when I give a matrix with only the groups of Interest to begin with. Is one of the ways incorrect?

This is a FAQ in our vignette.

thanks for the answer, I am just not sure I understand it correctly. So the first two code snippets I posted are the correct way to handle this situation with DESeq2?

The 2nd question (with the 3rd code snippet) is not about the batch effect, I just observed that results differ when having different matrix inputs, even though the samples that are compared are the same (just the presence of other samples changes the calculation apparently).

1

So the first two code snippets I posted are the correct way to handle this situation with DESeq2?

Yes

I just observed that results differ when having different matrix inputs, even though the samples that are compared are the same (just the presence of other samples changes the calculation apparently).

Yes, this is expected. And it is in our FAQ in the vignette.