Question

DESeq2 nested effect with 5 variables

0

Entering edit mode

sandeep.amberkar18 ▴ 10

@sandeepamberkar18-21432

Last seen 3.7 years ago

Rothamsted Research UK

Hello All,

I'm dealing with a rather complex RNAseq experiment that follows a nested structure of the tested variables that could be represented like this:

SampleName  tissue  temp    time    dev_stage   rep
Sample1 crown   21  am  ds1 rep1
Sample2 crown   21  am  ds1 rep3
Sample3 crown   21  am  ds1 rep4
Sample4 crown   21  am  ds2 rep1
Sample5 crown   21  am  ds2 rep2
Sample6 crown   21  am  ds2 rep3
Sample7 crown   21  am  ds2 rep4
Sample8 crown   21  am  ds3 rep1
Sample9 crown   21  am  ds3 rep2
Sample10 crown  21  am  ds3 rep3
Sample11 crown  21  am  ds3 rep4

From the several related posts related to experiment design on multifactor experiments, I came up with the design formula which looks like this:

exp.dds=DESeqDataSetFromMatrix(countData = counts_noZeros.df,
                               colData = exp_coldata,
                               design = ~0+tissue:temp:time:dev_stage)

I'm interested in determining the effect of these 4 variables on gene expression. For which I've 2 questions,

1) If I keep the design as is, is it correct that DESeq will account for a gene's expression considering a nested effect of these variables? In which case, is the model representation correct?

2) If I want to determine expression of only one of these variables should the design formula look something like this?

design = ~0+tissue

Any help is appreciated. With thanks.

Best, Sandeep

deseq2 RNA-seq • 637 views

ADD COMMENT • link updated 4.7 years ago by Michael Love 41k • written 4.7 years ago by sandeep.amberkar18 ▴ 10

score 0 · Answer 1 · 2019-07-23

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 4 hours ago

United States

There isn't a correct design for a given experiment, it depends on your assumptions about the samples.

I often recommend users to consult with a statistician to try to discuss how linear models can be used to answer various questions based on assumptions about the samples.

Check out the section of the vignette on interactions, which lists some of the considerations, but generally, I'd recommend to meet or collaborate with someone familiar with linear models.

ADD COMMENT • link 4.7 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Sorry I should have explained the exp. design better, the full experiment consists of an expression dataset comprising:

2 cultivars, each has
2 tissues, sampled at
2 temperatures, taken at
2 time points, sampled at
3 dev. stages, each having 4x replicates

The metadata therefore looks something like this:

SampleName  cultivar    tissue  temp    time    dev_stage   rep
Sample1 cadenza crown   21  am  ds2 rep1
Sample2 cadenza crown   21  am  ds2 rep2
Sample3 cadenza crown   21  am  ds2 rep3
Sample4 cadenza crown   21  am  ds2 rep4
Sample5 cadenza crown   21  pm  ds2 rep1
Sample6 cadenza crown   21  pm  ds2 rep2
Sample7 cadenza crown   21  pm  ds2 rep3
Sample16    cadenza leaf    21  am  ds2 rep1
Sample17    cadenza leaf    21  am  ds2 rep2
Sample18    cadenza leaf    21  am  ds2 rep3
Sample19    cadenza leaf    21  am  ds2 rep4
Sample20    cadenza leaf    21  pm  ds2 rep1
Sample21    cadenza leaf    21  pm  ds2 rep2
Sample22    cadenza leaf    21  pm  ds2 rep3
Sample23    cadenza leaf    21  pm  ds2 rep4
Sample24    paragon crown   21  am  ds2 rep1
Sample25    paragon crown   21  am  ds2 rep2
Sample26    paragon crown   21  am  ds2 rep3
Sample27    paragon crown   21  am  ds2 rep4
Sample28    paragon crown   21  pm  ds2 rep1
Sample29    paragon crown   21  pm  ds2 rep2
Sample30    paragon crown   21  pm  ds2 rep3
Sample31    paragon leaf    21  am  ds2 rep1
Sample32    paragon leaf    21  am  ds2 rep2
Sample33    paragon leaf    21  am  ds2 rep3
Sample34    paragon leaf    21  am  ds2 rep4
Sample35    paragon leaf    21  pm  ds2 rep1
Sample36    paragon leaf    21  pm  ds2 rep2
Sample37    paragon leaf    21  pm  ds2 rep3
Sample38    paragon leaf    21  pm  ds2 rep4

The key question is -- can I test the effect of any single variable at one time? In which case, I fear I might be ignoring the confounding effect of the other variables. Or should I use a nested model to the level to which I want to test the effect of a variable. For instance, if I had to check the effect of time, would the correct design be -

design = ~0+cultivar:tissue:temp:time

or should it be

design = ~0+time

Let me know if I missed something in either of these models.

Thanks.

Best, Sandeep

ADD REPLY • link 4.7 years ago sandeep.amberkar18 ▴ 10

0

Entering edit mode

I don’t have spare time unfortunately to work out what user’s statistical analysis and designs should be, but I have to limit my time on the support site for software related questions.

ADD REPLY • link 4.7 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Sorry I should have explained the exp. design better, the full experiment consists of an expression dataset comprising:

2 cultivars, each has
2 tissues, sampled at
2 temperatures, taken at
2 time points, sampled at
3 dev. stages, each having 4x replicates

The metadata therefore looks something like this:

SampleName  cultivar    tissue  temp    time    dev_stage   rep
Sample1 cadenza crown   21  am  ds2 rep1
Sample2 cadenza crown   21  am  ds2 rep2
Sample3 cadenza crown   21  am  ds2 rep3
Sample4 cadenza crown   21  am  ds2 rep4
Sample5 cadenza crown   21  pm  ds2 rep1
Sample6 cadenza crown   21  pm  ds2 rep2
Sample7 cadenza crown   21  pm  ds2 rep3
Sample16    cadenza leaf    21  am  ds2 rep1
Sample17    cadenza leaf    21  am  ds2 rep2
Sample18    cadenza leaf    21  am  ds2 rep3
Sample19    cadenza leaf    21  am  ds2 rep4
Sample20    cadenza leaf    21  pm  ds2 rep1
Sample21    cadenza leaf    21  pm  ds2 rep2
Sample22    cadenza leaf    21  pm  ds2 rep3
Sample23    cadenza leaf    21  pm  ds2 rep4
Sample24    paragon crown   21  am  ds2 rep1
Sample25    paragon crown   21  am  ds2 rep2
Sample26    paragon crown   21  am  ds2 rep3
Sample27    paragon crown   21  am  ds2 rep4
Sample28    paragon crown   21  pm  ds2 rep1
Sample29    paragon crown   21  pm  ds2 rep2
Sample30    paragon crown   21  pm  ds2 rep3
Sample31    paragon leaf    21  am  ds2 rep1
Sample32    paragon leaf    21  am  ds2 rep2
Sample33    paragon leaf    21  am  ds2 rep3
Sample34    paragon leaf    21  am  ds2 rep4
Sample35    paragon leaf    21  pm  ds2 rep1
Sample36    paragon leaf    21  pm  ds2 rep2
Sample37    paragon leaf    21  pm  ds2 rep3
Sample38    paragon leaf    21  pm  ds2 rep4

The key question is -- can I test the effect of any single variable at one time? In which case, I fear I might be ignoring the confounding effect of the other variables. Or should I use a nested model to the level to which I want to test the effect of a variable. For instance, if I had to check the effect of time, would the correct design be -

design = ~0+cultivar:tissue:temp:time

or should it be

design = ~0+time

Let me know if I missed something in either of these models.

Thanks.

Best, Sandeep

ADD REPLY • link 4.7 years ago sandeep.amberkar18 ▴ 10