Hi, I'm analyzing an RNA-Seq data set and I would like to perform a paired multifactor test that incorporates batch effect. When I use the design formula
~ response + response:PID + response:timepoint
then eliminate columns from the model matrix manually that contain only zeroes, DESeq2 will run the analysis as expected. However, when I change the design to
~ response + response:PID + response:timepoint + batch
I get the familiar error message
Error in checkFullRank(full) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
This is what my colData(dds) looks like
sample PID timepoint batch response
01-0603_on P01 fiftysix one PD
01-0603_pre P01 zero one PD
03-0601_on P01 fiftysix one PR
03-0601_pre P01 zero one PR
03-0606_on P02 fiftysix one PR
03-0606_pre P02 zero one PR
04-0605_on P03 fiftysix one PR
04-0605_pre P03 zero one PR
04-0632_on P02 fiftysix three PD
04-0632_pre P02 zero three PD
05-0614_on P04 fiftysix three PR
05-0614_pre P04 zero three PR
05-0621_on P03 fiftysix three PD
05-0621_pre P03 zero three PD
05-0624_on P05 fiftysix three PR
05-0624_pre P05 zero three PR
08-0631_on P04 fiftysix three PD
08-0631_pre P04 zero three PD
I realize that my samples aren't well balanced between batches (only one PD response in batch one), but was under the impression that I could still incorporate batch in the design formula in this case. Could you please tell me what I'm missing here and if there is a way to incorporate batch into my analysis?
Thanks!
Thank you so much for your quick answer! That makes perfect sense.