Question

2 cells x 2 media x 2 conditions

1

Entering edit mode

dickson.russel ▴ 10

@dicksonrussel-14158

Last seen 7.2 years ago

I have a RNA-seq experiment with the following design:

cell line	media	treatment
cellline1	medium1	ctrl
cellline1	medium1	ctrl
cellline1	medium1	ctrl
cellline1	medium1	treated
cellline1	medium1	treated
cellline1	medium1	treated
cellline1	medium2	ctrl
cellline1	medium2	ctrl
cellline1	medium2	ctrl
cellline1	medium2	treated
cellline1	medium2	treated
cellline1	medium2	treated
cellline2	medium1	ctrl
cellline2	medium1	ctrl
cellline2	medium1	ctrl
cellline2	medium1	treated
cellline2	medium1	treated
cellline2	medium1	treated
cellline2	medium2	ctrl
cellline2	medium2	ctrl
cellline2	medium2	ctrl
cellline2	medium2	treated
cellline2	medium2	treated
cellline2	medium2	treated

to get genes responding more in medium2 of cellline1 and cellline2 due to treatment, I analyzed both cell lines separately with the following formula:

cell1 = media+treatment+media:treatment

cell2 = media+treatment+media:treatment

Q1. how to fetch genes responding more (more up- or down-regulated), less (less up- or down-regulated) or opposite (up in medium2 but down in medium1) in medium2 vs. medium 1 of individual cell lines?

I want to obtain genes which are responding more (more up- or down-regulted) in medium2(treated) of cellline2 vs. medium2 of cellline1. I used the following design:

I combined cell line and treatment as one factor (cell_treat): e.g., cellline1_ctrl, cellline1_treated

cell2medium2 = media+cell_treat+media:cell_treat

with this formula, I'm getting those genes which are not even significant with their nominal p-values (p<0.05) during pairwise comparisons.

Q2. What am I doing wrong in this design?

Please help to improve my designs!

Thanks

deseq2 • 1.5k views

ADD COMMENT • link updated 8.2 years ago by Gavin Kelly ▴ 690 • written 8.2 years ago by dickson.russel ▴ 10

score 2 · Answer 1 · 2017-10-11

2

Entering edit mode

Gavin Kelly ▴ 690

@gavin-kelly-6944

Last seen 5.7 years ago

United Kingdom / London / Francis Crick…

For question 1, your model looks correct, you just need to pull out the interaction term. It should be evident from running 'resultsNames' which one you need, it'll involve the one with words from both the treatment and media levels.

For question 2, I can interpret it two ways. When you say "responding" "in medium2(treated) of cellline2 vs. medium2 of cellline1" do you mean

A) look at the treated vs untreated in the medium 2 of cellline 2, and compare it with the fold-change of treated vs untreated in medium 2 of celline 1, or

B) compare medium 2 treated in cellline 2, and compare it with medium 2 treated in cell-line 1.

I'm not sure what the parenthesised '(treated)' in your question indicates, so it's hard to resolve the exact question. For the possibility (A), you're going to need a model (saving my typing by using abbreviated factor names) along the lines of ~ (M+ C+T)^3 which will automatically take account of all possible interactions (or M +C + T + C:T + M:C:T, if you're happy ignoring medium-treatment and cellline-medium interactions). If you relevel things so that medium2 is the baseline, it will be a bit easier to find the contrast you need, which will be the one corresponding to the C:T term. It might be conceptually easier to subset the data down to just look at medium2, and then you're back to a two-way interaction, but keeping all the data in will be more powerful.

For possibility (B), it's probably easiest to concatenate the M C and T terms together, and then directly compare the two groups you've specified.

If you want to split the resulting genelists into the different classes (more up, up vs down...), then you're probably best off doing that after the test, by extracting the relevant linear combination of coefficients - if you provide some code that allows us to recreate your colData, then it will be easier for us to give specific guidance on this.

ADD COMMENT • link 8.2 years ago Gavin Kelly ▴ 690

0

Entering edit mode

Gavin - thank you for your reply.

Q2: Sorry for being not very clear about my question2. I want to obtain those genes which are responding more (more up- or more down-regulated in [medium2(treated/control) vs medium1(treated/control) of cellline2] VS. [medium2(treated/control) vs medium1(treated/control) of cellline1] i.e.

Cellline2[medium2(treated/control samples) VS. medium1(treated/control samples)] VS. Cellline1[medium2(treated/control samples) VS. medium1(treated/control samples)]

"(treated/control)" i.e. comparison of treated samples vs. control samples

Regarding your suggestion (b), performing a pairwise comparison by combining M, C and T terms together is very interesting, however, how to get the direction of expression change (whether the gene is up in celline2 or celline1)?

Please see below the code to recreate coldata:

coldata = data.frame(row.names = c('Control1.1', 'Control1.2', 'Control1.3',  'Treated1.1', 'Treated1.2', 'Treated1.3',
'Control2.1', 'Control2.2', 'Control2.3', 'Treated2.1', 'Treated2.2', 'Treated2.3',
'Control3.1', 'Control3.2', 'Control3.3',   'Treated3.1', 'Treated3.2', 'Treated3.3',
'Control4.1', 'Control4.2', 'Control4.3',   'Treated4.1', 'Treated4.2', 'Treated4.3'
),
cellline = factor(rep(c("cellline1","celline2"),each=12)),
media = factor(rep(c("medium1","medium2"),each=3)),
treatment = factor(rep(c("control","treated"),each=3)))

Thank you for your time!

ADD REPLY • link 8.2 years ago dickson.russel ▴ 10

2

Entering edit mode

OK, it looks like neither of my guesses was correct then, as you're expanded question refers to both cell-lines, both media, and both conditions. It's a third-order effect, and so looking at design so my approach 'A' is correct, but you test the final, 3rd order coefficient - the one with both an M-word, a C-word and a T-word in the resultsNames. Interpreting this is tricky. So the first order treatment effect is treat/control. The second order MT effects allow us to determine the ratio of treatment effects between the two media, for each cell line (given, for media1 and media2 respectively, by the T expression and the T+MT expression). A positive number for MT could mean that treatment is having a more positive effect in media2, a less negative effect in media2, or a negative effect in media1 and positive in media2. So when we get to your required third order effect, which is effectively the difference of your second-order MT effects between the two cell-lines, we've got combinatorially way more possible interpretations. You can either work out the interpretation long-hand, or 'cluster' your results by the signs of the individual coefficients that are provided by the fit - the different signatures (e.g. +++++-) will correspond to different qualitative behaviours.

ADD REPLY • link 8.2 years ago Gavin Kelly ▴ 690

0

Entering edit mode

Gavin - Thank you for taking the time to explain in details. I appreciate your time and help.

ADD REPLY • link 8.2 years ago dickson.russel ▴ 10