Question: DESeq2: Error when correcting batch effect
0
3 months ago by
Rimma0
Rimma0 wrote:

Hello, I'm struggling with batch correction for RNA-seq data in DESeq2. For example, my colData looks like this (10 samples, 6 controls+4 treatment, belong to 2 batches):

samples   condition batch
100        PH7     1
101        PH7     1
103        PH7     1
63         PH7     1
64         ctr     1
74         ctr     1
75         ctr     1
76         ctr     2
88         ctr     2
99         ctr     2


As far as I understood from this post, my problem is that some conditions belongs only to one batch, for example, all "PH7" belong only to 1 batch. I tried to do as was suggested on the post:

mm = model.matrix(~ batch+conditions, colData(dds))


And then look up for columns where ALL zeros, however, I don't have such... At least in one raw of each column there is 1.

Is there a way to make such analysis?

deseq2 batch effect • 143 views
modified 3 months ago by Michael Love26k • written 3 months ago by Rimma0
Answer: DESeq2: Error when correcting batch effect
0
3 months ago by
Michael Love26k
United States
Michael Love26k wrote:

You can just use ~batch + condition here. What is the error?

I tried, it shows this one:

  Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.


I don't get that error when I run this design and this column data. Maybe check your code?

dds <- makeExampleDESeqDataSet(m=10)
dds$batch <- factor(rep(1:2,c(7,3))) dds$condition <- factor(rep(2:1,c(4,6)))
design(dds) <- ~ batch + condition
dds <- DESeq(dds)


I a bit simplified colData for post, but does it make changes if my actual colData looks like this (so the major difference I see is that the third batch has all conditions which don't belong to any other batches):

samples   condition batch
100        PH7     1
101        PH7     1
103        PH7     1
63         PH7     1
64         ctr     1
74         ctr     1
75         ctr     1
76         ctr     2
88         ctr     2
99         ctr     2
11         hbls     3
12         hbls     3
13         hbls     3


Otherwise, my code looks fine to me, but I will recheck it again

Yes it makes a difference. This is why it's good to try to describe your actual data, so we don't go back and forth while talking about different datasets.

In your actual dataset, you can't control for batch effects because your batch 3 is confounded with your condition there. This means that your results cannot be trusted entirely, regardless of what statistical method you use, because you can't tell batch 3 apart from that condition.

While this doesn't solve that particular problem, my preferred approach to deal with the two batches within control at this point would be to use SVA to capture heterogeneity that is orthogonal to the condition. We have example code in the workflow on how to do this.

Sorry for this.

Yes,I understand the problem now...

Thank you for clarifications :)