I would be really greatful if somebody could provide some insight about how to go about comparing two experiments that have both been sequenced at different times (and potentially in a different manner (for example single end vs paired end). I am trying to model the batch effect in DESeq2 in order to account for effects by batch but im running in to trouble and am unsure why:
I have two datasets with which i have combined the raw counts and the metadata.. all in order with regards to order of columns and rows of metadata etc... however, when i create the formula:
dds<-DESeqDataSetFromMatrix(countData = mergedCounts1, colData = metadataRobert, design = ~Batch+Group)
I get an error:
Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified.
As far as i am aware these are not the same as batch is simply batch 1 and batch 2, while the groups has multipls groups, control, trt 1, trt2, trt3 etc...
I guess the first question is, how to overcome this within the model. And the second question is, is a workaround to use the remove batch effect in limma on the VST counts and then re-plot the PCA and do downstream analysis?
This is particularly confusing for me as the variance is quite big between groups which also corresponds to batches.. however.. as the actual groups are completely different this may also be normal!!! so it is hard to tell if there is 90% variance or 70% variance because of a batch effect... I imagine if there were batch effects those samples that were truly similar between different sequencing types would be much closer to each other but not overlapping... not 90% different? I am unsure about how to tackle batch effects from this point of view...