DESeq2: Multivariate Models and Model Statistic
2
0
Entering edit mode
Last seen 4.5 years ago

Hello,

I am working on RNA-seq data which consists of 15 samples:

 Sample Condition Type X1.1 LType1 UR X1.2 LType1 UR X1.3 LType1 UR X2.1 LType2 UR X2.2 LType2 UR X2.3 LType2 UR X3.1 LType2 UR X3.2 LType2 UR X3.3 LType2 UR X4.1 LType1 DR X4.2 LType1 DR X4.3 LType1 DR X5.1 LType2 DR X5.1 LType2 DR X5.2 LType2 DR

Although the Ligand Type (LType) was used rather than Sample to avoid “model matrix not full rank”, either Sample or Sample set (e.g. X1, X2) the following formula was used: design=~Type+Condition+Type:Condition

The comparison we’re interested in is between the UR and the DR, accounting for the differences in Sample/Condition.

The commands used are:

dds = DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~Type+Condition+Type:Condition)

dds = DESeq(dds, test="LRT", reduced=~Type:Condition)

res = results(dds, name="type_DR_vs_UR")

I have 3 questions:

1) Is the correct way to assess the comparison I am interested in?

2) Is the inclusion of an interaction term justified or not?

3) Is there a way in DESeq2 to obtain a single good-of-fit statistic for the model?

R version 3.3.1, DESeq2_1.14.1

deseq2 • 972 views
0
Entering edit mode
Gavin Kelly ▴ 590
@gavin-kelly-6944
Last seen 15 months ago
United Kingdom / London / Francis Crick…

The part where you say "between the UR and the DR, accounting for the differences in Sample/Condition" means that you're probably wanting

dds = DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~Type+Condition)
dds = DESeq(dds)
res = results(dds, name="type_DR_vs_UR")

The interaction term would only be included if you were interested in different UR vs DR responses in the two different conditions; here it sounds like you're just interested in normalising out the difference between conditions, and then comparing between types which is effectively what we get if we just include main effects.

You don't really need a likelihood ratio test (you could test for the same phenomenon using a LRT, but you'd have a 'reduced = ~Condition' in combination with my first line).  It's conceptually easier to understand the test if you're examining a specific coefficient. So your 3rd command is correct, but a little bit at odds with the LRT procedure.

0
Entering edit mode
@mikelove
Last seen 21 hours ago
United States

Quick answer first: we don't have goodness of fit measures. You can take a look at the calculated values (see vignette, Access to all calculated values), such as the expected values for the count matrix and the gene-wise dispersions. Also I recommend people look at MA plots, plots of counts, and PCA plots for quality control.

Secondly: can you say more about what is X1 - X5 and what is LType? What do these variables represent exactly? Why are X2 and X3 mapped to LType 2?

We don't have random effects in DESeq2, so you can't account for this kind of within-condition structure when it is confounded with the condition. In other words, X1+X2+X3 vs X4+X5 is linearly dependent with the DR vs UR contrast, so you cannot have a model that includes all these terms together. There is not a unique solution to that formulation.

The only way I know of accounting for this kind of structure is to use limma-voom with the duplicateCorrelation() function, which informs the model of correlation between the X's.

0
Entering edit mode

Hello, Michael!

a question about the goodness of fit. Can I use the number of DEGs identified to determine whether the model should include a certain variable? I mean, If after adding a variable, the number of DEGs increases significantly, does it mean that I should add this variable to the model?

0
Entering edit mode

I'm not a fan of determining the design by # DEG.

My approach is to include variables that I believe may affect the counts. If there are a lot of technical variables and not many samples then I use SVA or RUV.