Question

DEseq2 design tutorial with multiple experimental factors

0

Entering edit mode

christophe.vanhaver • 0

@christophevanhaver-21827

Last seen 4.6 years ago

Belgium

Dear all,

First of all, I would like to inform you that I'm new in RNA-seq analysis and the DEseq2 package. Also, I have (very) basic knowledge in statistic, so my apologies if I'm asking naive questions :)

We would like to analyse different cell population that we isolated from different samples/environnement (blood, ascites, tumor) from different patients. RNA-sequencing was done in bulk. Because these data were generated in the context of a collaboration between several research groups, all the cells were not isolated from the same lab. I would like to test this parameter of course.

The idea in my design is the following: because I expect difference between cell type (of course) and conditions (the environnement), I've created a new column in my annotation object, which combine (paste0) the column cell_type and condition. In brief, I will consider "gMDSC from blood" as a different cell population than "gMDSC from ascites".

Here's a (not complete) exemple of my annotation df, with sample names as rownames.

               cell_type          cond origin                 group
CA.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
DE.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
DE.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
DO.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
FR.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
FR.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
FR.gMDSC.Spleen      gMDSC Cancer_Spleen      1   gMDSC_Cancer_Spleen
KD.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
KD.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
NO.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
NO.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
ON.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
ON.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
RE.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
RE.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
RI.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
RI.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
SH.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
TI.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
TI.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
A01.gMDSC              gMDSC       Ascites      2         gMDSC_Ascites
A03.gMDSC              gMDSC       Ascites      2         gMDSC_Ascites

With sample names put as rownames. 1, 2, 3 and 4 are the 4 levels of my "origin" factor, and correspond to the different research group that isolated the cells

The way I understood the Deseq2 design formula, is "you choose the factor you want to use for comparaison in your analysis (the last factor), while puting the factors you want to "control" first. I guess control here mean "taking into account the variability due to this factor while analysing DEG for the factor of interest".

Here was my formula:

dds <-  DESeqDataSetFromMatrix(countData = cnt,
                             colData = annot,
                             design = ~ origin + group)

Unfortunately, I got this error message:

"Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed. Please read the vignette section 'Model matrix not full rank': vignette('DESeq2')"

If I remove the "origin" in my design formula, the script runs fine. But I feel that I miss something quite important there.

So I'm quite lost here...Am I going in the good direction for this kind of analysis (compairing cell population) or am I completely wrong?

Thanks in advance for your help, and sorry if I forgot to put some important information in the thread, but do not hesitate to ask them :)

Chris

deseq2 annotation normalization • 835 views

ADD COMMENT • link 5.9 years ago christophe.vanhaver • 0

score 0 · Answer 1 · 2020-03-23

0

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 8 weeks ago

The Cave, 181 Longwood Avenue, Boston, …

With sample names put as rownames. 1, 2, 3 and 4 are the 4 levels of my "origin" factor, and correspond to the different research group that isolated the cells

Does origin need to be included, in that case? Are you expecting inter-laboratory differences? If your assumption is that there are no differences, then exclude origin from the formula, which will also solve your general question about the model not being 'full rank'

ADD COMMENT • link 5.9 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Hi and thank you,

I don't really know if it is required or not. Maybe there are no difference to be expected but this is a factor I would like to check.

If I'm just using group in the design, it works. But I'm afraid to miss something then.

Chris

ADD REPLY • link 5.9 years ago christophe.vanhaver • 0

0

Entering edit mode

I'd recommend to meet with a statistician to guide choices in the design.

ADD REPLY • link 5.9 years ago Michael Love 43k