DESeq2 design formula
2
0
Entering edit mode
tracecakes • 0
@tracecakes-22806
Last seen 20 months ago

Hi all,

I am using DESeq2 to find differentially expressed genes and have gone around in circles as to how I should design my experiment formula and understanding the formula itself.

I am interested in genes differentially expressed between two different diets (A, B) in uninfected and infected mice in 3 different tissues, especially for the mice that died (infected mice on diet A). We have RNA seq data for tissue samples collected at day 0 (uninfected, to observe the effect of diet alone) and at day 7 post infection (see below).

I started by subsetting the data to analyse each tissue and day separately (3 x 2 DE analyses). For day 0, I could see some evidence that diet affected gene expression. Therefore, does it make sense to account for day in the design formula?

I saw a somewhat similar post here but I am still unsure.

> data
          Tissue Day     Status Diet  Outcome
1   Hypothalamus  D0 Uninfected    B Survived
2   Hypothalamus  D0 Uninfected    B Survived
3   Hypothalamus  D0 Uninfected    B Survived
4   Hypothalamus  D0 Uninfected    B Survived
5   Hypothalamus  D0 Uninfected    A Survived
6   Hypothalamus  D0 Uninfected    A Survived
7   Hypothalamus  D0 Uninfected    A Survived
8   Hypothalamus  D0 Uninfected    A Survived
9   Hypothalamus  D7   Infected    B Survived
10  Hypothalamus  D7   Infected    B Survived
11  Hypothalamus  D7   Infected    B Survived
12  Hypothalamus  D7   Infected    B Survived
13  Hypothalamus  D7   Infected    A     Died
14  Hypothalamus  D7   Infected    A     Died
15  Hypothalamus  D7   Infected    A     Died
16 Brown_adipose  D0 Uninfected    B Survived
17 Brown_adipose  D0 Uninfected    B Survived
18 Brown_adipose  D0 Uninfected    B Survived
19 Brown_adipose  D0 Uninfected    B Survived
20 Brown_adipose  D0 Uninfected    A Survived
21 Brown_adipose  D0 Uninfected    A Survived
22 Brown_adipose  D0 Uninfected    A Survived
23 Brown_adipose  D0 Uninfected    A Survived
24 Brown_adipose  D7   Infected    B Survived
25 Brown_adipose  D7   Infected    B Survived
26 Brown_adipose  D7   Infected    B Survived
27 Brown_adipose  D7   Infected    B Survived
28 Brown_adipose  D7   Infected    A     Died
29 Brown_adipose  D7   Infected    A     Died
30 Brown_adipose  D7   Infected    A     Died
31 Brown_adipose  D7   Infected    A     Died
32         Liver  D0 Uninfected    B Survived
33         Liver  D0 Uninfected    B Survived
34         Liver  D0 Uninfected    B Survived
35         Liver  D0 Uninfected    B Survived
36         Liver  D0 Uninfected    A Survived
37         Liver  D0 Uninfected    A Survived
38         Liver  D0 Uninfected    A Survived
39         Liver  D0 Uninfected    A Survived
40         Liver  D7   Infected    B Survived
41         Liver  D7   Infected    B Survived
42         Liver  D7   Infected    B Survived
43         Liver  D7   Infected    B Survived
44         Liver  D7   Infected    A     Died
45         Liver  D7   Infected    A     Died
46         Liver  D7   Infected    A     Died
47         Liver  D7   Infected    A     Died

Would greatly appreciate any guidance, thanks!

Tracy

deseq2 design formula • 137 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

I'm limited in my time these days, so I can't offer statistical modeling advice on the support site (e.g. does it make sense to include ... in the design, etc.), and have to save time for answering software related issues. I'd recommend collaborating with a statistician or someone familiar with linear models for discussion on design considerations.

ADD COMMENT
0
Entering edit mode
swbarnes2 ▴ 890
@swbarnes2-14086
Last seen 22 minutes ago
San Diego

For starters, you need to figure out exactly what questions you are trying to answer, because you probably need different designs for different questions.

Also, drop either Day or Status. They convey the exact same information.

I started by subsetting the data to analyse each tissue and day separately (3 x 2 DE analyses).

Subsetting by tissue is likely the best thing to do, but you probably shouldn't subset by day. Size normalization and gene dispersion estimates will likely work within tissues across all days, but maybe not between tissues.

especially for the mice that died (infected mice on diet A).

How did this work? Unless you had animal techs watching the mice to see when they keeled over, I think you are going to get mostly differences of which RNAs stay stable in dead tissue longer.

Honestly, I think you need to redo the whole thing and collect RNA from the infected diet A mice before they croak. Find some kind of phenotype you can use to measure severity of disease, instead of waiting for them to keel over and harvesting their RNA who knows how long after they stop living. I don't think your infected diet A data is worth anything, which really limits what you can do with this data set.

ADD COMMENT

Login before adding your answer.

Traffic: 273 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6