Question: DESeqDataSet 2 factors with a covariate
1
4.3 years ago by
Lillyhchen10
United States
Lillyhchen10 wrote:

Hello,

I'm running DESeq with the design=~Cell_Treat + MGMT.  I'm working with 6 samples from two different patients, so 12 samples in total.  There are two factors, Cell line (HF2303 or HF 2927) and Treatment (untrt or trt).  I'm trying to include a continuous covariate into the design (MGMT) to control for the effect of the factors based off of MGMT presence in the cell.

 > colData(se)
DataFrame with 12 rows and 5 columns
sample     Cell     Treat     MGMT Cell_Treat
<factor> <factor> <integer> <factor>   <factor>
1   accepted_hits.CSC.Contc.HF2303        A         1      pos        A-1
2    accepted_hits.CSC.TMZa.HF2303        A         2      pos        A-2
3   accepted_hits.SDC.Contb.HF2303        A         1      pos        A-1
4   accepted_hits.SDC.Contc.HF2303        A         1      pos        A-1
5    accepted_hits.SDC.TMZb.HF2303        A         2      pos        A-2
...                            ...      ...       ...      ...        ...
8    accepted_hits.CSC.TMZb.HF2927        B         2      neg        B-2
9    accepted_hits.CSC.TMZc.HF2927        B         2      neg        B-2
10  accepted_hits.SDC.Contc.HF2927        B         1      neg        B-1
11   accepted_hits.SDC.TMZa.HF2927        B         2      neg        B-2
12   accepted_hits.SDC.TMZc.HF2927        B         2      neg        B-2

When I try to set up the DESeqDataSet with design=~Cell_Treat + MGMT it gives me an error that the model matrix isn't in full rank.  I thought this might be due to that MGMT factors correlate the same as the Cell factor, but I'm not sure if I should get rid of the column because id like to base the results off of whether or not the MGMT variable is present of not and the cell column would be needed to separate HF2303 from HF2927?

Am I doing something wrong in the way the design is set up or columns?

This is the full error message:

> dds<-DESeqDataSet(se, design=~Cell_Treat + MGMT)
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
See the section 'Model matrix not full rank' in vignette('DESeq2')

modified 4.3 years ago by Michael Love26k • written 4.3 years ago by Lillyhchen10
Answer: DESeqDataSet 2 factors with a covariate
0
4.3 years ago by
United States
James W. MacDonald51k wrote:

You say that MGMT is a continuous variable, but in your DataFrame it is coded as a factor with two levels, so you should clarify which one it is. If MGMT is a factor that is dependent on the cell type, then it is completely confounded by cell type, and cannot be in the model at the same time as cell type. In other words, both of those factors tell us the same exact thing, so you can't solve for them as if they are different things.

Oh okay thanks! I see what you're saying, I converted MGMT to a integer value as "1" and "0" instead of "pos" and "neg", but I still get the same error when I run DESeqDataSet.

> colData(se)
DataFrame with 12 rows and 5 columns
sample     Cell     Treat     MGMT Cell_Treat
<factor> <factor> <integer> <integer>   <factor>
1   accepted_hits.CSC.Contc.HF2303        A         1        1        A-1
2    accepted_hits.CSC.TMZa.HF2303        A         2        1        A-2
3   accepted_hits.SDC.Contb.HF2303        A         1        1        A-1
4   accepted_hits.SDC.Contc.HF2303        A         1        1        A-1
5    accepted_hits.SDC.TMZb.HF2303        A         2        1        A-2
...                            ...      ...       ...      ...        ...
8    accepted_hits.CSC.TMZb.HF2927        B         2        0        B-2
9    accepted_hits.CSC.TMZc.HF2927        B         2        0        B-2
10  accepted_hits.SDC.Contc.HF2927        B         1        0        B-1
11   accepted_hits.SDC.TMZa.HF2927        B         2        0        B-2
12   accepted_hits.SDC.TMZc.HF2927        B         2        0        B-2

When I ran DESeqDataSet it gave me an error that to convert my numeric integer to a factor along with the same matrix not full error.

Is there a mistake in how i have the design set up?  I'm still trying to figure out how the order of how I list things in the design work and I'm worried the order I have the variables set up in might be causing the problem

This is the code I used to make the DESeqDataSet:

> dds<-DESeqDataSet(se, design=~MGMT + Cell_Treat)
the design formula contains a numeric variable with integer values,
specifying a model with increasing fold change for higher values.
did you mean for this to be a factor? if so, first convert
this variable to a factor using the factor() function
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
See the section 'Model matrix not full rank' in vignette('DESeq2')
> levels(se$MGMT) NULL > se$MGMT <- factor(se$MGMT) > levels(se$MGMT)
[1] "0" "1"
> dds<-DESeqDataSet(se, design=~MGMT + Cell_Treat)
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
See the section 'Model matrix not full rank' in vignette('DESeq2')

Changing MGMT to 0/1 doesn't make it continuous! It will still be treated by R as a factor, and will still be confounded with the cell type. If MGMT is only produced in one cell type, then saying 'this cell produces MGMT'  and 'this cell is HF2303' tells you the same thing (e.g., if you know it's a HF2303 cell, then you also know it produces MGMT, and vice versa). Since you are not adding any new information, well, you aren't adding any new information.

You should also note that if these samples are all just from two different patients, there may be a patient-level factor that you should be accounting for.

Okay that's what I thought, but I wasn't sure if I had done something else wrong.  I ended up removing the Cell column and now I'm left with 2 factors, MGMT and Treatment.  My current design is ~Treat_MGMT with the two factors combined into a group and I am trying to contrast the untrt vs the trt for the MGMT positive and negative lines

> resultsNames(dds)
[1] "Intercept"         "Cell_Treat0.trt"   "Cell_Treat0.untrt"
[4] "Cell_Treat1.trt"   "Cell_Treat1.untrt"
> resultB<-results(dds, contrast=list(c("untrt", "Cell_Treat0.untrt"), c("trt", "Cell_Treat0.trt")))
Error in checkContrast(contrast, resNames) :
all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

Am I just going about the comparisons all wrong?  I had previously gotten like 1400 differentially expressed genes and it doesn't support what I expected to happen at all.  I thought about adding the cell types (SDC vs CSC) as another column, the cells being CSC or SDC doesn't have a direct comparison that i'm trying to make and I'm not sure whether or not to incorporate it without making a CSC contrast against a SDC

You are getting an error, which says

all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

And in your post you also list the elements of resultsNames(object). If I parse that further, it reads

all elements of the contrast <snip> should be elements of 'resultsNames(object)'

Does that not tell you what the problem is?

Yes, that makes sense.  I've been reading through forum posts and the vignette on interaction terms and it's cleared up a few things for me.  Currently I'm using:

> dds<-DESeqDataSet(se, design=~MGMT*Treat)
> dds$Treat<-relevel(dds$Treat,"untrt")
> dds<-DESeq(dds)

From what I understand, this design should give me the effect of treated vs untreated based off MGMT presence.  My goal is to find 1) the effect of MGMT presence on treatment vs untreated (does MGMT presence influence response to treatment) 2) what are the differentially expressed genes between treatments for MGMT neg 3) what are the differentially expressed genes between treatments for MGMT pos.

After running dds with the design  "dds<-DESeqDataSet(se, design=~MGMT*Treat)" the resultsNames(dds) that i get are:

> resultsNames(dds)
[1] "Intercept"          "MGMT_1_vs_0"        "Treat_trt_vs_untrt"
[4] "MGMT1.Treattrt"

However, I'm not sure what to contrast to find the effect of MGMT  presence on treatment vs untreated based off the resultNames.  Is the design MGMT*Treat any different from using MGMT:Treat or MGMT + Treat?

Thanks! sorry for all the questions i'm just starting to pick up bioinformatics and r

Answer: DESeqDataSet 2 factors with a covariate
0
4.3 years ago by
Michael Love26k
United States
Michael Love26k wrote:

Let me just add a quick suggestion that you discuss your desired analysis with a local statistician at your institute.

They can help explain in person the issue with including MGMT and Cell_Treat at the same time in the model, and so help guide your choice of design and the interpretation of results.

While it's possible to run statistical software without a statistical background, a lot of the basics and interpretation of the results can best be explained in a face to face conversation.

While we've added many automatic warnings and error checks to uncover modeling problems, the lack of an error or warning doesn't necessarily mean that you're getting the analysis you want.

Hi, thanks for the suggestion, I've tried to ask a statistician at my institute before, but turns out she doesn't use R for her work.  One of my biggest issues I believe is understanding the design, if I use the design MGMT*Treat is it any different from using MGMT:Treat or MGMT + Treat and in what sense?  Does DESeq look at the interactions differently?

1

There's nothing specific to R or DESeq2 here.

I'm sticking to my earlier suggestion on the previous thread:

A: Running DESeqDataSet with 4 factors?

Don't use an interaction term. Combine the factors you want using factor(paste0(...)), use a design of ~group, then make any comparisons you like using results(dds, contrast=c("group","...","...")), replacing "..." with the levels of group you want to contrast.

Thanks! I ended up figuring it out, i think i was getting confused by levels and resultsNames()  and got them mixed up.

Thanks again for the help!