Question

DESeq2: Multivariate Models and Model Statistic

0

Entering edit mode

faded_mystery • 0

@faded_mystery-12139

Last seen 7.3 years ago

Hello,

I am working on RNA-seq data which consists of 15 samples:

Sample	Condition	Type
X1.1	LType1	UR
X1.2	LType1	UR
X1.3	LType1	UR
X2.1	LType2	UR
X2.2	LType2	UR
X2.3	LType2	UR
X3.1	LType2	UR
X3.2	LType2	UR
X3.3	LType2	UR
X4.1	LType1	DR
X4.2	LType1	DR
X4.3	LType1	DR
X5.1	LType2	DR
X5.1	LType2	DR
X5.2	LType2	DR

Although the Ligand Type (LType) was used rather than Sample to avoid “model matrix not full rank”, either Sample or Sample set (e.g. X1, X2) the following formula was used: design=~Type+Condition+Type:Condition

The comparison we’re interested in is between the UR and the DR, accounting for the differences in Sample/Condition.

The commands used are:

dds = DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~Type+Condition+Type:Condition)

dds = DESeq(dds, test="LRT", reduced=~Type:Condition)

res = results(dds, name="type_DR_vs_UR")

I have 3 questions:

1) Is the correct way to assess the comparison I am interested in?

2) Is the inclusion of an interaction term justified or not?

3) Is there a way in DESeq2 to obtain a single good-of-fit statistic for the model?

Many thanks for any comments!

R version 3.3.1, DESeq2_1.14.1

deseq2 • 2.1k views

ADD COMMENT • link updated 3.0 years ago by Michael Love 41k • written 7.3 years ago by faded_mystery • 0

score 0 · Answer 1 · 2017-01-10

The part where you say "between the UR and the DR, accounting for the differences in Sample/Condition" means that you're probably wanting

dds = DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~Type+Condition)
dds = DESeq(dds)
res = results(dds, name="type_DR_vs_UR")

The interaction term would only be included if you were interested in different UR vs DR responses in the two different conditions; here it sounds like you're just interested in normalising out the difference between conditions, and then comparing between types which is effectively what we get if we just include main effects.

You don't really need a likelihood ratio test (you could test for the same phenomenon using a LRT, but you'd have a 'reduced = ~Condition' in combination with my first line). It's conceptually easier to understand the test if you're examining a specific coefficient. So your 3rd command is correct, but a little bit at odds with the LRT procedure.

score 0 · Answer 2 · 2017-01-10

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 hour ago

United States

Quick answer first: we don't have goodness of fit measures. You can take a look at the calculated values (see vignette, Access to all calculated values), such as the expected values for the count matrix and the gene-wise dispersions. Also I recommend people look at MA plots, plots of counts, and PCA plots for quality control.

Secondly: can you say more about what is X1 - X5 and what is LType? What do these variables represent exactly? Why are X2 and X3 mapped to LType 2?

We don't have random effects in DESeq2, so you can't account for this kind of within-condition structure when it is confounded with the condition. In other words, X1+X2+X3 vs X4+X5 is linearly dependent with the DR vs UR contrast, so you cannot have a model that includes all these terms together. There is not a unique solution to that formulation.

The only way I know of accounting for this kind of structure is to use limma-voom with the duplicateCorrelation() function, which informs the model of correlation between the X's.

ADD COMMENT • link 7.3 years ago Michael Love 41k

0

Entering edit mode

Hello, Michael!

a question about the goodness of fit. Can I use the number of DEGs identified to determine whether the model should include a certain variable? I mean, If after adding a variable, the number of DEGs increases significantly, does it mean that I should add this variable to the model?

ADD REPLY • link 3.0 years ago Y. • 0

0

Entering edit mode

I'm not a fan of determining the design by # DEG.

My approach is to include variables that I believe may affect the counts. If there are a lot of technical variables and not many samples then I use SVA or RUV.

ADD REPLY • link 3.0 years ago Michael Love 41k