Question

DESeq2 - experimental design with 2 variables

0

Entering edit mode

Jane Merlevede ▴ 90

@jane-merlevede-5019

Last seen 6.1 years ago

Dear all,

I have trouble writing my design formula for an RNASeq experiment of 16 samples.

There are 5 controls and 11 tumors. The tumors correspond to 2 cell lines and have either Low or high intensity:

  Sample name  Intensity  CellLine

1  CDX_1_1  Low_BLI   CDX_1
2  CDX_1_2  Low_BLI   CDX_1
3  CDX_2_1 High_BLI   CDX_2
4  CDX_2_2 High_BLI   CDX_2
5  CDX_2_3  Low_BLI   CDX_2
6  CDX_2_4  Low_BLI   CDX_2
7  CDX_2_5  Low_BLI   CDX_2
8  CDX_2_6 High_BLI   CDX_2
9  CDX_2_7 High_BLI   CDX_2
10 CDX_2_8 High_BLI   CDX_2
11 CDX_2_9  Low_BLI   CDX_2
12     CTL_1    0_BLI      Ctrl
13     CTL_2    0_BLI      Ctrl
14     CTL_3    0_BLI      Ctrl
15     CTL_4    0_BLI      Ctrl
16     CTL_5    0_BLI      Ctrl

I get the following error message:

dds=DESeqDataSetFromHTSeqCount(DataFrame,"/home/htseqCount_25082017", ~Intensity+CellLines+Intensity:CellLines)
Error in checkFullRank(modelMatrix) :
  the model matrix is not full rank, so the model cannot be fit as specified.
  Levels or combinations of levels without any samples have resulted in
  column(s) of zeros in the model matrix.

I read the “Model matrix not full rank” section and tried to find similar designs, but it did not help.

The experimental design seems simple, but I don't understand why I get this message. I agree that for the Ctrl and CellLine1, the intensity variable is redondant with the CellLine, but not for the CellLine2.

I need to answer questions like: What are the differences between the 5 controls and the 9 tumors of CellLine2?

Except writing something like:

1  CDX_1_1   CDX_1_Low
2  CDX_1_2   CDX_1_Low
3  CDX_2_1   CDX_2_High
4  CDX_2_2   CDX_2_High
5  CDX_2_3   CDX_2_Low
6  CDX_2_4   CDX_2_Low
7  CDX_2_5   CDX_2_Low
8  CDX_2_6   CDX_2_High
9  CDX_2_7   CDX_2_High
10 CDX_2_8   CDX_2_High
11 CDX_2_9   CDX_2_Low
12     CTL_1   Ctrl
13     CTL_2   Ctrl
14     CTL_3   Ctrl
15     CTL_4   Ctrl
16     CTL_5   Ctrl

which seems not clean, I don't see how to use only 1 variable or how to write this design differently.

Any suggestion would be appreciated.

Thank you

deseq2 multiple factor design • 2.4k views

ADD COMMENT • link 7.2 years ago Jane Merlevede ▴ 90

0

Entering edit mode

Ok, thank you for your answer

ADD REPLY • link 7.2 years ago Jane Merlevede ▴ 90

0

Entering edit mode

If I may ask one more thing, I would like a confirmation:

To look for the differences between the 5 controls and the 6 tumors with low intensity (whatever the cell type), I think I should use the contrast:

resultsNames(ddsFinal)
[1] "Intercept"                      "Intensity_CellLinesCDX_1_Low"  "Intensity_CellLinesCDX_2_High" "Intensity_CellLinesCDX_2_Low"  "Intensity_CellLinesCtrl"

res=results(ddsFinal,contrast=c(0,1,0,1,-1),pAdjustMethod="BH", cooksCutoff=TRUE,alpha=0.01, independentFiltering=TRUE)

Thus I get 120 differentially expressed genes

I tried this:

results(ddsFinal,contrast=c(0,1/2,0,1/2,-1),pAdjustMethod="BH", cooksCutoff=TRUE,alpha=0.01, independentFiltering=TRUE)

Thus I get 270 differentially expressed genes.

Am I right with the first solution? And why would be the second incorrect?

Thank you in advance

ADD REPLY • link 7.2 years ago Jane Merlevede ▴ 90

0

Entering edit mode

If you want the average of the low intensity tumors the coefficients need to be 1/2.

ADD REPLY • link 7.2 years ago Michael Love 42k

0

Entering edit mode

Thank you for your reply.

I guess, in the estimation of the average of the low intensity, the sample size is taken into account? There are 4 samples in CellLine1 and 2 in the other one. Since there are less samples in the CellLine1, I want them to contribute less in the model.

Sorry, I don't see clearly the difference with contrast=c(0,1,0,1,-1). Do I look here at the effect of CellLine1 + CellLine2 compared to the 6 controls?

For me, it is unclear if I should look at the average of the low intensity tumors. My aim for this specific question is to look for the differences between the 5 controls and the 6 tumors with low intensity (whatever the cell type). I tried a model with the intensity information only (0, low, high) and got 209 differentially expressed genes (between the results from contrasts c(0,1,0,1,-1) and c(0,1/2,0,1/2,-1)), but I would prefer to keep the same model for all the comparisons I have to do.

ADD REPLY • link 7.2 years ago Jane Merlevede ▴ 90

0

Entering edit mode

Yes, the standard errors take into account the sample size, so the errors for a coefficient are reduced as the number of samples used to calculate that coefficient grows.

For further questions about why one numeric contrast is recommended, or what the statistical meaning of numerical contrasts are, I think you should meet with a statistical collaborator.

ADD REPLY • link 7.2 years ago Michael Love 42k

0

Entering edit mode

Thank you for your help.

Yes, I will try to clarify these points.

ADD REPLY • link 7.2 years ago Jane Merlevede ▴ 90

score 0 · Answer 1 · 2017-08-28

Here's a question from a few weeks ago where the user had the same design, and my response:

3 factor design formula 'Model Matrix not full rank'

In short, you can't estimate the two factors separately (as explained in that post above), but you can combine then and use a single variable in the design, e.g. ~condition, where condition looks like the second column of the last colData you posted above.