Question: Batch effects between different sequencing runs
0
10 weeks ago by
A40
A40 wrote:

Hi all,

I would be really greatful if somebody could provide some insight about how to go about comparing two experiments that have both been sequenced at different times (and potentially in a different manner (for example single end vs paired end). I am trying to model the batch effect in DESeq2 in order to account for effects by batch but im running in to trouble and am unsure why:

I have two datasets with which i have combined the raw counts and the metadata.. all in order with regards to order of columns and rows of metadata etc... however, when i create the formula:

dds<-DESeqDataSetFromMatrix(countData = mergedCounts1, colData = metadataRobert, design = ~Batch+Group)


I get an error:

Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified.

As far as i am aware these are not the same as batch is simply batch 1 and batch 2, while the groups has multipls groups, control, trt 1, trt2, trt3 etc...

I guess the first question is, how to overcome this within the model. And the second question is, is a workaround to use the remove batch effect in limma on the VST counts and then re-plot the PCA and do downstream analysis?

This is particularly confusing for me as the variance is quite big between groups which also corresponds to batches.. however.. as the actual groups are completely different this may also be normal!!! so it is hard to tell if there is 90% variance or 70% variance because of a batch effect... I imagine if there were batch effects those samples that were truly similar between different sequencing types would be much closer to each other but not overlapping... not 90% different? I am unsure about how to tackle batch effects from this point of view...

thanks!

deseq2 batchcorrection • 119 views
modified 10 weeks ago by swbarnes2340 • written 10 weeks ago by A40

Please post your colData. You can do

as.data.frame(colData(dds))


thank you!

  SampleName
<fctr>
Organism
<fctr>
Replicate
<int>
Group
<fctr>
SequenceRun
<int>
Molecule
<fctr>
C11orf95_R1 EV_R1   Mus musculus    1   empty vector control    1   RNA
C11orf95_R2 EV_R2   Mus musculus    2   empty vector control    1   RNA
C11orf95_R3 EV_R3   Mus musculus    3   empty vector control    1   RNA
EV_R1   C11orf95_R1 Mus musculus    1   C11orf95    1   RNA
EV_R2   C11orf95_R2 Mus musculus    2   C11orf95    1   RNA
EV_R3   C11orf95_R3 Mus musculus    3   C11orf95    1   RNA
Fus1_R1 RELA_R1 Mus musculus    1   RELA WT 1   RNA
Fus1_R2 RELA_R2 Mus musculus    2   RELA WT 1   RNA
Fus1_R3 RELA_R3 Mus musculus    3   RELA WT 1   RNA
Fus2_R1 Fus1_R1 Mus musculus    1   RELA FUS1   1   RNA
Fus2_R2 Fus1_R2 Mus musculus    2   RELA FUS1   1   RNA
Fus2_R3 Fus1_R3 Mus musculus    3   RELA FUS1   1   RNA
RELA_R1 Fus2_R1 Mus musculus    1   RELA FUS2   1   RNA
RELA_R2 Fus2_R2 Mus musculus    2   RELA FUS2   1   RNA
RELA_R3 Fus2_R3 Mus musculus    3   RELA FUS2   1   RNA
M350.1  M350-1  Mus musculus    1   RHD 2   RNA
M350.2  M350-2  Mus musculus    2   RHD 2   RNA
M350.3  M350-3  Mus musculus    3   RHD 2   RNA
M350.4  M350-4  Mus musculus    4   RHD 2   RNA
M351.1  M351-1  Mus musculus    1   ZF  2   RNA
M351.2  M351-2  Mus musculus    2   ZF  2   RNA
M351.3  M351-3  Mus musculus    3   ZF  2


i hope this is clear enough? There is nothing really unusual (to me) that is immediately apparent...

if also add Batch1, Batch2 instead of 1 or 2 etc.. get the same error...

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by A40
Answer: Batch effects between different sequencing runs
0
10 weeks ago by
swbarnes2340
swbarnes2340 wrote:

If "SequenceRun" is batch, this is hopelessly confounded. It doesn't matter that multiple groups are found in each batch, what matter is that all the members of each group are found in only one batch.

Yes, agree with this. You can't control for batch because it is confounded with the biological condition of interest. This is explicitly discussed in the vignette section that the "checkFullRank" error points you to.

I'll also note for the future that being sequenced at different times doesn't really matter. You can put things on the sequencing instrument whenever you want, that doesn't introduce technical artifacts. Prepping the RNA on different days, making the libraries on different days, that adds technical variation. You could probably dump read 2 to make the paired end samples look more like single end (I know my lab preps everything for paired end, even if they don't ever read the other end); it's not like having paired end data makes your counts twice as good as single end.

Many thanks all!! appreciate the response and clarification... this indeed makes perfect sense.. a quick few follow up questions:

With this as a caveat: is it ok to simply proceed with ~Group in the model , to model differences between the difference conditions (groups)?

And secondly, in the case where I would be able to model a batch effect in a scenario like this? Taking out read 2 for PE seq makes sense..

What would i do where libraries have been prepped on different days? would it be the case that i have to representation of all groups in both batches?

so having said that, I should proceed with the caveat that neither sequencing contains the same group?? of course experimental validation would follow!