Search
Question: How to handle strong batch effects
0
gravatar for jovel_juan
9 months ago by
jovel_juan10
Canada
jovel_juan10 wrote:

In RNAseq experiments, batch effects are very strong. Here two situations I have observed often:

1. The same cell line is used to replicate an experiment and results between two experiments are quite different (see case1 plot).

2. Two groups of patients (in this case three and two) are sampled at two different points in time, and they cluster according to sampling date.

I suspect that sequencing is quite different between different runs. We have seen this effect when sequencing bacterial genomes too. In a collection of 96 samples prepared simultaneously, we sequenced separately 48 and 48, their sequences exhibit a quite different profile and they cluster exactly according to sequencing batch.

What to do with DESeq2, should the data be analyzed together, including the factor "batch" in the model? Something like:

ddsMat <- DESeqDataSetFromMatrix(countData = countdata, colData = sampleInfo, design =~ group + batch

OR, should both batches be analyzed separately?

Thanks!

Links of the above-mentioned figures follow:

https://docs.google.com/a/ualberta.ca/viewer?a=v&pid=sites&srcid=dWFsYmVydGEuY2F8ampvdmVsfGd4OjNjOGE4OTViMDA1ODMyNGI

https://docs.google.com/a/ualberta.ca/viewer?a=v&pid=sites&srcid=dWFsYmVydGEuY2F8ampvdmVsfGd4OjcwNTQ2YTI4NjEwMjI0NTI 

ADD COMMENTlink modified 9 months ago • written 9 months ago by jovel_juan10

Thanks Mike.

Just to let you know, the difference between ~batch + condition AND ~ condition + batch, must be small. I get the same set of transcripts deregulated. The only difference I notice is that the FDR is a bit greater in the first case.

ADD REPLYlink written 9 months ago by jovel_juan10
1
gravatar for Michael Love
9 months ago by
Michael Love20k
United States
Michael Love20k wrote:

The best approach is to put batch in the design (in the vignette, we recommend to put it first, e.g. ~batch + condition).

Even if batch is very large, doing this will control for any small or large shifts, and only look for differences in condition that are consistent across the batches.

This is a general strategy across methods employing linear models.

ADD COMMENTlink written 9 months ago by Michael Love20k

But order doesn't actually matter in the design, right?  It only matters if you don't specify in the results what element you want to contrast?

ADD REPLYlink modified 9 months ago • written 9 months ago by swbarnes250

Right. But lots of users just use results() and so I repeat this advice just in case.

ADD REPLYlink written 9 months ago by Michael Love20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 357 users visited in the last hour