Question

RNA-seq batch correction using technical replicates profiled across batches

0

Entering edit mode

enricoferrero ▴ 660

@enricoferrero-6037

Last seen 3.0 years ago

Switzerland

Hello,

I have the following (certainly not ideal) RNA-seq experimental design:

Batch 1 contains 40 samples with condition A + 12 samples with condition B
Batch 2 contains 40 samples with condition C + the same 12 samples with condition B

So, the 12 samples with condition B are technical replicates that have been profiled across the two batches.

I'm actually not interested in condition B; I need to compare the 40 samples with condition A from batch 1 with the 40 samples with condition C from batch 2.

Are there Bioconductor packages (or other methods/approaches) that will allow me to use the 12 technical replicates profiled across batches to correct for batch effects before performing a differential expression analysis?

I already came across RUVSeq (see this question) and I'm looking for alternative approaches.

Thank you!

deseq2 ruvseq sva edger limma • 2.4k views

ADD COMMENT • link updated 5.8 years ago by Gordon Smyth 51k • written 5.8 years ago by enricoferrero ▴ 660

score 2 · Answer 1 · 2019-02-17

2

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 8 hours ago

WEHI, Melbourne, Australia

I would use limma for an experiment like this. I would analyse all the samples together, including a batch effect term in the linear model and using duplicateCorrelation() to link the technical replicates of the same samples. (The duplicateCorrelation block variable is the same as the sample ID.) Differential expression can then be done normally between A and C and the batch correction will happen automatically.

The above model gets around the fact that conditions A and C do not occur together in the same batch. Condition A is compared with condition B within batch 1 and condition C is compared with B within batch 2. The difference between A and C, which is what you eventually want, is inferred from A - B as compared to C - B.

ADD COMMENT • link 5.8 years ago Gordon Smyth 51k

0

Entering edit mode

Thanks Gordon.

Please correct me if I'm wrong, but I think a potential problem with this approach is that batch is perfectly confounded with condition so it would not be possible to fit a model of the kind ~ batch + condition.

This is why I would like to use the technical replicates profiled across batches to correct for the batch effect so that then I'd be able to fit a model of the kind ~ condition.

Would using duplicateCorrelation() and fitting a ~condition model be appropriate here?

ADD REPLY • link 5.8 years ago enricoferrero ▴ 660

2

Entering edit mode

No, batch is not perfectly confounded with condition, because condition B is in both batches and, more than that, the exact same samples are in both batches. Presumably, the whole purpose of repeating the condition B samples was in order to deconfound the batches.

What I have suggested to you does exactly what you say you want to do -- it uses the technical replicates to do the batch correction, but in a organic way rather than ad hoc.

ADD REPLY • link 5.8 years ago Gordon Smyth 51k

0

Entering edit mode

Awesome, thanks Gordon. I will do as you suggest and report back if I run into problems.

ADD REPLY • link 5.8 years ago enricoferrero ▴ 660