Not full rank DESeq2
1
0
Entering edit mode
@2a6aaea2
Last seen 12 days ago
Netherlands

Hi everybody,

Recently I have been doing differential gene expression analysis. I am new to this, and I struggle a bit with my design.

I have 4 different cell lines: A, B, C and D. In previous experiments I found that cell lines A and B are resistant to radiation, while cell lines C and D are sensitive. I am interested in the differential gene expression in the resistant vs sensitive group. However, I want to control for cell line specific differences.

My dataset looks something like this:

# creating example dataframe
df <- data.frame(cline = factor(rep(c("A","C","B", "D"),each=3)),
replicate = factor(rep(rep(c("1","2", "3"),each=1),4)),
group = factor(rep(rep(c("resistant","sensitive"),each=3),2)))
df <- df[order(df\$cline), ]

# show dataframe
print(df)


Initially I tried to model this using:

design = ~ cline + group


However, when I input this design in DESeq2, I get an 'the model matrix is not full rank' error. I know this is probably because the resistant and sensitive groups are uniquely defined by the clines. However I am unsure how to redefine the design column to account for this.

Any help would be really welcome!

Kind regards,

DESeq2 DifferentialExpression • 177 views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You can't adjust the design matrix to account for it. You could make the assumption that the cline doesn't matter, and compare the resistant and sensitive, or you could assume that the groups don't matter and compare the clines. But you can't account for both, because the clines are nested in the groups. Think of it this way. Let's assume you have four groups of people, two of which are 'resistant' and two are 'sensitive'. And you want to compare their weights.

But the two groups of people are in separate places, so one group gets weighed on this rusted old bathroom scale that somebody found, and the other group was weighed on one of those new electronic scales. If you wanted to compare the weights between the two groups, you would never be sure if the differences were real, or if they were simply due to the differences in the two scales. And there isn't anything you could do to fix that, because all of the resistant people were measured on the rusty scale and all the sensitive people were measured on the electronic one. You could say that you think they are both accurate enough, and just make the comparisons, but if you try to publish and you mention that the scales were different but you assume it's OK, the reviewers are probably going to have some pointed questions about that assumption.

0
Entering edit mode

Dear Dr. MacDonald,

Thank you for your insightful explanation. I assume there is also no way to account for this in the experimental design (I don't mean the model matrix but the actual experimental design)? In other words, this would always be a problem in these kinds of experiments and these kind of comparisons?

Kind regards,

0
Entering edit mode

No, you can't account for it in the experimental design either. You can get a set of genes that are differentially expressed, but at that point you have a conundrum. You already know the cell lines have different phenotypes (the sensitivity or resistance to radiation), and those differences may be conferred by one or more of the genes that are different (or maybe not - it's always possible that the difference is due to a mutation in one or more genes, and they may well be expressed at the same level). But they are different cell lines, and there may be other differences as well that have nothing to do with the radiation sensitivity.

You can always get the list of differentially expressed genes and then try to infer which are responsible for the radiation sensitivity differences and do knock outs of the resistant lines to check.