Recently I have been doing differential gene expression analysis. I am new to this, and I struggle a bit with my design.
I have 4 different cell lines: A, B, C and D. In previous experiments I found that cell lines A and B are resistant to radiation, while cell lines C and D are sensitive. I am interested in the differential gene expression in the resistant vs sensitive group. However, I want to control for cell line specific differences.
My dataset looks something like this:
# creating example dataframe df <- data.frame(cline = factor(rep(c("A","C","B", "D"),each=3)), replicate = factor(rep(rep(c("1","2", "3"),each=1),4)), group = factor(rep(rep(c("resistant","sensitive"),each=3),2))) #reorder to increase readability df <- df[order(df$cline), ] # show dataframe print(df)
Initially I tried to model this using:
design = ~ cline + group
However, when I input this design in DESeq2, I get an 'the model matrix is not full rank' error. I know this is probably because the resistant and sensitive groups are uniquely defined by the clines. However I am unsure how to redefine the design column to account for this.
Any help would be really welcome!