[DESeq2] Multi-factor designs
2
0
Entering edit mode
ofonov ▴ 10
@ofonov-10155
Last seen 7.5 years ago

I have RNA-seq dataset with 3 factors:

  • Treatment: control , treatment (2 levels)
  • Cell lines: ABC.1, Colo320DM, HCT.8, OU31, OVCAR4, RKO (6 levels)
  • Groups: I, II (2 levels, so that cell lines ABC.1, Colo320DM, OVCAR4, OU31 belong to group I and RKO, HCT.8 to group II).

I would like to find out following:

  1. List of genes which respond to treatment within groups I and II
  2. Genes (biomarkers) unique to each group (group I vs group II).

Which design formula would be appropriate for this case?

RNA-seq workflow at Bioconductor suggests to use

~ group + treatment + group:treatment

I have tried it and it does not seem to work, there are no results produced. I would like to add cell lines to the formula, so that e.g. cell line A untreated is compared to treated cell line A, but not to the bulk of cell lines belonging to one group. 

Is there a way of doing it in DESEq2?

 

deseq2 multifactorial design • 5.8k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Can you post this:

as.data.frame(colData(dds))

also make sure you are using the latest version of DESeq2 (v1.10): 

packageVersion("DESeq2")

ADD COMMENT
0
Entering edit mode
ofonov ▴ 10
@ofonov-10155
Last seen 7.5 years ago

Hi! Thanks for the reply, here is output of the commands:

> packageVersion("DESeq2")
[1] ‘1.10.1’

> as.data.frame(colData(dds))
   names cell.line treatment responders
1    1_1 Colo320DM   CONTROL       RESP
2    1_2 Colo320DM   CONTROL       RESP
3    1_3 Colo320DM   TREATED       RESP
4    1_4 Colo320DM   TREATED       RESP
5    2_1     ABC.1   CONTROL       RESP
6    2_2     ABC.1   CONTROL       RESP
7    2_3     ABC.1   TREATED       RESP
8    2_4     ABC.1   TREATED       RESP
9    3_1      OU31   CONTROL       RESP
10   3_2      OU31   CONTROL       RESP
11   3_3      OU31   TREATED       RESP
12   3_4      OU31   TREATED       RESP
13   4_1    OVCAR4   CONTROL       RESP
14   4_2    OVCAR4   CONTROL       RESP
15   4_3    OVCAR4   TREATED       RESP
16   4_4    OVCAR4   TREATED       RESP
17   8_1       RKO   CONTROL      NRESP
18   8_2       RKO   CONTROL      NRESP
19   8_3       RKO   TREATED      NRESP
20   8_4       RKO   TREATED      NRESP
21   9_1       HCT   CONTROL      NRESP
22   9_2       HCT   CONTROL      NRESP
23   9_3       HCT   TREATED      NRESP
24   9_4       HCT   TREATED      NRESP

ADD COMMENT
0
Entering edit mode

(A quick note on the support site: there are two ways to reply: Comments or Answers. If you want to start a threaded discussion you can use Comment. If you have an answer for the original post, you would use Answer.)

Where did the other 3 cell lines go? You said there were 9, but I only see 6.

For your aim to find "genes (biomarkers) unique to each group (group I vs group II)" do you want to find genes which have different response to treatment across group? Or just different baseline (control) expression across group? Or something else?

ADD REPLY
0
Entering edit mode

Hi, thanks for pointing out that there are 6 cell lines, for this analysis I have removed 3 cell lines which are known to be "partial responders" and can affect the analysis (I have corrected the original question).

I would like to find genes for which the change in regulation differs between responders and non-responders (across groups). For instance, genes that are upregulated in one group and donwregulated in another. I would like DEseq2 first to compare treated and untreated samples within cell lines and then compare responders vs non-responders.

It seems to me that it has to be done in a way, which is described in section 3.12.1 Linear combinations of DEseq2 vignette, where it is suggested to add ind.n row to colData, but I am not sure about the right way of doing this.

ADD REPLY
0
Entering edit mode

It just takes a little management, because the interaction term between group (responders) and cell.line produces some columns which are all zero and must be removed manually.

You should add a new column cell.line.nested to colData which has levels A,B,C,D for responders and levels A,B for non-responders. Now create a matrix with:

mm <- model.matrix(~ group + group:cell.line.nested + group:treatment, colData(dds))

Then go in and remove the columns with all zeros (see vignette section "Levels without samples" immediately following the linear combinations section).

You will then provide this modified model matrix to the 'full' argument:

dds <- DESeq(dds, full=mm)

You should be able to test the individual treatment effects in responders and non-responders using the 'name' argument, and you can contrast them using the list-style of the 'contrast' argument. And this controls for cell line effects, so you are comparing the treatment effects within cell lines.

ADD REPLY
0
Entering edit mode

HI Michael,  I'm working with a similar design where I have a model with 3 factors, two fixed factors, SEX (M, F) and DIET (Low and high). Then the random factor that is Strain 1, 2, 3.

I'm using DESeq2_1.12.4  

> as.data.frame(colData(dds))
             sex diet strain    name sizeFactor replaceable
C2          male  H     2   M3HPS  0.6105888       FALSE
C3          male  H     3   M6HPS  1.0305482       FALSE
C4        female  H     1   F2HPS  1.2311071       FALSE
C5        female  H     2   F3HPS  0.5622835       FALSE
C6        female  H      3   F6HPS  1.1050964       FALSE
C7          male  L     1   M2LPS  1.2206942        TRUE
C8          male  L     3   M6LPS  1.3228398        TRUE
C9          male  L     2   M3LPS  0.6330985        TRUE
C10       female  L     1   F2LPS  1.3312426        TRUE
C11       female  L      2   F3LPS  1.2465671        TRUE
C12       female  L      3   F6LPS  1.8290932        TRUE
Maria_2HM   male  H      1 M2HPS_b  0.7746767       FALSE
Maria_2LM   male  L     1 M2LPS_b  0.9076435        TRUE

I'm most interested in the effect of diet, so use LRT to analyze the effect diet, full = ~ sex + strain + diet , reduced = ~ sex + strain. However analyzing each factor I know there is a difference among the strains and between sexes. The difference among the strains will allow me to generalize the effect of the diets to the population.

I have also evaluated the effect of diet*strain

full = ~ sex + strain + diet + diet*strain , reduced = ~ sex + strain + diet

my references are: Female, High diet, strain 1

here I get the following

[1,] "Intercept"         
[2,] "sex_male_vs_female"
[3,] "strain_2_vs_1"     
[4,] "strain_3_vs_1"     
[5,] "diet_L_vs_H"   
[6,] "strain2.dietL"   
[7,] "strain3.dietL"    

How could I get the effect of diet within strain.  The effect of low diet vs high diet in strain 1? Low diet vs high diet in strain 2?Low diet vs high diet in strain 3? . I would like to see the effect of the diet in females vs males.  Any help will be more than appreciated.

 

 

 

ADD REPLY
0
Entering edit mode

hi,

I'd recommend you speak with a local statistician to discuss the interpretation of coefficients and how to pull out your comparisons of interest. There are many possible designs to use here, and the modeling choices are up to really.

ADD REPLY

Login before adding your answer.

Traffic: 641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6