Question

Batch effect with one sample being in two batches

0

Entering edit mode

lmrogers34 ▴ 10

@lmrogers34-19639

Last seen 22 months ago

Germany

I have a simple design with my conditions being split across two plates. I have bulk RNA-seq data with conditions diseased and healthy. One plate from each batch consists of 50 per cent healthy and 50 per cent diseased.

I am trying to correct for the batch effect of two plates in Deseq2 where I have one sample that is in both plates. Is there a way to do this I am not aware of?

Or do I just use the PCA to see if the batch correction has been done correctly? I use design = plate + condition in my model in Deseq2.

BatchEffect DESeq2 • 1.8k views

ADD COMMENT • link 22 months ago lmrogers34 ▴ 10

score 1 · Answer 1 · 2022-06-07

1

Entering edit mode

jeroen.gilis ▴ 90

@jeroengilis-21551

Last seen 4 months ago

Belgium

First question, is it bulk data? Below, I will assume it is. I am assuming you have bulk RNA-seq dataset with two conditions (say healthy and diseased), where all samples of a certain condition are on one plate and all samples of the other condition are on the other plate. Except for one sample, which is on both plates.

If I understand the design correctly (EDIT: I did not, see updated response in comments), you will never be able to disentangle the condition effect and the plate effect. This is often called "perfect confounding" between the variable of interest and a nuisance factor.

Correcting (removing) the plate effect can in principle be done in two ways. If the conditions are present on both plates (e.g., 4 healthy and 4 diseased samples on both plates), it can simply be done by adding a fixed effect to your model like you suggested. If each sample (of a single same condition) are present on both plates (splitting the tissue in half), one can think of applying batch correction methods, which aim for removing the technical plate effect while retaining the relevant condition effect. Example methods are harmony, Seurat CCA, ....

However, neither strategy will work for you. The analysis with the plate + condition formula cannot be performed, since it essentially will aim to estimate the condition effect (healthy vs diseased) in both plates. For that, you need both conditions to be present on both plates. The second strategy, batch correction, would need to learn a batch correction strategy based on a single sample. This will also not work; I expect the batch correction method to return an error, and even if it doesn't, the results should not be trusted.

Jeroen

ADD COMMENT • link 22 months ago jeroen.gilis ▴ 90

1

Entering edit mode

Hi Jeoren, thanks for the reply.

Foe the sake of simplicity I am going with one condition, diseased and healthy and it is in fact bulk data. However, the data are split amongst the plates. So healthy is 50 per cent on plate 1 and 50 per cent on plate 2 and the same for diseased. There is one sample patient 1 who is healthy and is on both plates in order to see the difference in the plates. So I don't have a confounded plate or effect or anything like that. I was just wondering if there was any way to say that a sample should be the same across both plates?

ADD REPLY • link 22 months ago lmrogers34 ▴ 10

1

Entering edit mode

Great, that really changes the design for the better!

I have to say though, having just one patient on both plates is quite uncommon to me; I typically encounter either having no patients on both plates or all patients on both plates. And those two scenarios would come with different designs.

The former scenario should be analyzed with the plate + condition design, which again just estimates the condition effect while correcting for the plate effect. The latter scenario should not be analyzed with that design, because it does not acknowledge that two samples come from the same patient. This would require a different modelling approach, like the one suggested in the edgeR user guide part 3.5 (just change the treatment variable there to a plate variable and imagine just two disease states): https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Long story short, all your samples except the patient 1 are following the first design, so plate + condition will work for those samples. However, I am a bit uncomfortable with patient 1 appearing twice, as that is not acknowledged by that model. If you are being pragmatic, you could argue that ignoring this for just 1 patient will not strongly impact your analysis, also depending on how many samples you have in total. If you are being strict, I would consider removing patient 1 from the analysis (altogether or from one of the plates).

I am interested in other people's take on this, but I would really advice using the latter strategy.

Jeroen

ADD REPLY • link 22 months ago jeroen.gilis ▴ 90

2

Entering edit mode

I agree with Jeroen's points above, also concerns about repeating one sample on both plates in the statistical analysis. For DESeq2 with fixed effects you'd just drop one of the technical replicates I think. Or you could use random effects modeling.

There is one avenue of methods that deal with technical replicates, in RUV. But I'm not sure it's worth it, or very efficient, with just a single technical replicate.

ADD REPLY • link 22 months ago Michael Love 41k

0

Entering edit mode

Yeah, it is just one patient sample repeated. Maybe I should explain a bit more. Each patient sample is split into 3 so I have triplicates already for each patient. For this one patient who is in both plates, I just have 6 samples. So if I collapse the wells into technical replicates it should be ok?

The biologist thought that by doing the experiment with the same sample on both plates we would be able to see the variation more clearly.

ADD REPLY • link 22 months ago lmrogers34 ▴ 10

1

Entering edit mode

I guess, you can either go the route of random effects modeling with limma's duplicateCorrelation or look into RUV methods for estimating technical variance with a single repeated sample, but we don't have support for dealing with this in DESeq2.

ADD REPLY • link 22 months ago Michael Love 41k

0

Entering edit mode

Cool thanks Michael. I think I will just ignore the sample and report back to not do this again to the lab. Thanks for all your help.

ADD REPLY • link 22 months ago lmrogers34 ▴ 10