Question: DESeq2: Error when correcting batch effect
0
gravatar for Rimma
3 months ago by
Rimma0
Rimma0 wrote:

Hello, I'm struggling with batch correction for RNA-seq data in DESeq2. For example, my colData looks like this (10 samples, 6 controls+4 treatment, belong to 2 batches):

samples   condition batch    
    100        PH7     1
    101        PH7     1
    103        PH7     1
    63         PH7     1
    64         ctr     1
    74         ctr     1
    75         ctr     1
    76         ctr     2
    88         ctr     2
    99         ctr     2

As far as I understood from this post, my problem is that some conditions belongs only to one batch, for example, all "PH7" belong only to 1 batch. I tried to do as was suggested on the post:

mm = model.matrix(~ batch+conditions, colData(dds))

And then look up for columns where ALL zeros, however, I don't have such... At least in one raw of each column there is 1.

Is there a way to make such analysis?

deseq2 batch effect • 143 views
ADD COMMENTlink modified 3 months ago by Michael Love26k • written 3 months ago by Rimma0
Answer: DESeq2: Error when correcting batch effect
0
gravatar for Michael Love
3 months ago by
Michael Love26k
United States
Michael Love26k wrote:

You can just use ~batch + condition here. What is the error?

ADD COMMENTlink written 3 months ago by Michael Love26k

I tried, it shows this one:

  Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.
ADD REPLYlink written 3 months ago by Rimma0

I don't get that error when I run this design and this column data. Maybe check your code?

dds <- makeExampleDESeqDataSet(m=10)
dds$batch <- factor(rep(1:2,c(7,3)))
dds$condition <- factor(rep(2:1,c(4,6)))
design(dds) <- ~ batch + condition
dds <- DESeq(dds)
ADD REPLYlink written 3 months ago by Michael Love26k

Thank you for reply Michael!

I a bit simplified colData for post, but does it make changes if my actual colData looks like this (so the major difference I see is that the third batch has all conditions which don't belong to any other batches):

samples   condition batch    
    100        PH7     1
    101        PH7     1
    103        PH7     1
    63         PH7     1
    64         ctr     1
    74         ctr     1
    75         ctr     1
    76         ctr     2
    88         ctr     2
    99         ctr     2
   11         hbls     3
   12         hbls     3
   13         hbls     3

Otherwise, my code looks fine to me, but I will recheck it again

ADD REPLYlink written 3 months ago by Rimma0

Yes it makes a difference. This is why it's good to try to describe your actual data, so we don't go back and forth while talking about different datasets.

In your actual dataset, you can't control for batch effects because your batch 3 is confounded with your condition there. This means that your results cannot be trusted entirely, regardless of what statistical method you use, because you can't tell batch 3 apart from that condition.

While this doesn't solve that particular problem, my preferred approach to deal with the two batches within control at this point would be to use SVA to capture heterogeneity that is orthogonal to the condition. We have example code in the workflow on how to do this.

ADD REPLYlink written 3 months ago by Michael Love26k

Sorry for this.

Yes,I understand the problem now...

Thank you for clarifications :)

ADD REPLYlink written 3 months ago by Rimma0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour