Question

[DESeq2] Multi-factor designs

0

Entering edit mode

ofonov ▴ 10

@ofonov-10155

Last seen 6.8 years ago

I have RNA-seq dataset with 3 factors:

Treatment: control , treatment (2 levels)
Cell lines: ABC.1, Colo320DM, HCT.8, OU31, OVCAR4, RKO (6 levels)
Groups: I, II (2 levels, so that cell lines ABC.1, Colo320DM, OVCAR4, OU31 belong to group I and RKO, HCT.8 to group II).

I would like to find out following:

List of genes which respond to treatment within groups I and II
Genes (biomarkers) unique to each group (group I vs group II).

Which design formula would be appropriate for this case?

RNA-seq workflow at Bioconductor suggests to use

~ group + treatment + group:treatment

I have tried it and it does not seem to work, there are no results produced. I would like to add cell lines to the formula, so that e.g. cell line A untreated is compared to treated cell line A, but not to the bulk of cell lines belonging to one group.

Is there a way of doing it in DESEq2?

deseq2 multifactorial design • 5.5k views

ADD COMMENT • link 8.0 years ago ofonov ▴ 10

score 1 · Answer 1 · 2016-04-21

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 20 hours ago

United States

Can you post this:

as.data.frame(colData(dds))

also make sure you are using the latest version of DESeq2 (v1.10):

packageVersion("DESeq2")

ADD COMMENT • link 8.0 years ago Michael Love 41k

score 0 · Answer 2 · 2016-04-22

0

Entering edit mode

ofonov ▴ 10

@ofonov-10155

Last seen 6.8 years ago

Hi! Thanks for the reply, here is output of the commands:

> packageVersion("DESeq2") [1] ‘1.10.1’

> as.data.frame(colData(dds)) names cell.line treatment responders 1 1_1 Colo320DM CONTROL RESP 2 1_2 Colo320DM CONTROL RESP 3 1_3 Colo320DM TREATED RESP 4 1_4 Colo320DM TREATED RESP 5 2_1 ABC.1 CONTROL RESP 6 2_2 ABC.1 CONTROL RESP 7 2_3 ABC.1 TREATED RESP 8 2_4 ABC.1 TREATED RESP 9 3_1 OU31 CONTROL RESP 10 3_2 OU31 CONTROL RESP 11 3_3 OU31 TREATED RESP 12 3_4 OU31 TREATED RESP 13 4_1 OVCAR4 CONTROL RESP 14 4_2 OVCAR4 CONTROL RESP 15 4_3 OVCAR4 TREATED RESP 16 4_4 OVCAR4 TREATED RESP 17 8_1 RKO CONTROL NRESP 18 8_2 RKO CONTROL NRESP 19 8_3 RKO TREATED NRESP 20 8_4 RKO TREATED NRESP 21 9_1 HCT CONTROL NRESP 22 9_2 HCT CONTROL NRESP 23 9_3 HCT TREATED NRESP 24 9_4 HCT TREATED NRESP

ADD COMMENT • link 8.0 years ago ofonov ▴ 10

0

Entering edit mode

(A quick note on the support site: there are two ways to reply: Comments or Answers. If you want to start a threaded discussion you can use Comment. If you have an answer for the original post, you would use Answer.)

Where did the other 3 cell lines go? You said there were 9, but I only see 6.

For your aim to find "genes (biomarkers) unique to each group (group I vs group II)" do you want to find genes which have different response to treatment across group? Or just different baseline (control) expression across group? Or something else?

ADD REPLY • link 8.0 years ago Michael Love 41k

0

Entering edit mode

Hi, thanks for pointing out that there are 6 cell lines, for this analysis I have removed 3 cell lines which are known to be "partial responders" and can affect the analysis (I have corrected the original question).

I would like to find genes for which the change in regulation differs between responders and non-responders (across groups). For instance, genes that are upregulated in one group and donwregulated in another. I would like DEseq2 first to compare treated and untreated samples within cell lines and then compare responders vs non-responders.

It seems to me that it has to be done in a way, which is described in section 3.12.1 Linear combinations of DEseq2 vignette, where it is suggested to add ind.n row to colData, but I am not sure about the right way of doing this.

ADD REPLY • link 8.0 years ago ofonov ▴ 10

0

Entering edit mode

It just takes a little management, because the interaction term between group (responders) and cell.line produces some columns which are all zero and must be removed manually.

You should add a new column cell.line.nested to colData which has levels A,B,C,D for responders and levels A,B for non-responders. Now create a matrix with:

mm <- model.matrix(~ group + group:cell.line.nested + group:treatment, colData(dds))

Then go in and remove the columns with all zeros (see vignette section "Levels without samples" immediately following the linear combinations section).

You will then provide this modified model matrix to the 'full' argument:

dds <- DESeq(dds, full=mm)

You should be able to test the individual treatment effects in responders and non-responders using the 'name' argument, and you can contrast them using the list-style of the 'contrast' argument. And this controls for cell line effects, so you are comparing the treatment effects within cell lines.

ADD REPLY • link 8.0 years ago Michael Love 41k

0

Entering edit mode

HI Michael, I'm working with a similar design where I have a model with 3 factors, two fixed factors, SEX (M, F) and DIET (Low and high). Then the random factor that is Strain 1, 2, 3.

I'm using DESeq2_1.12.4

> as.data.frame(colData(dds))
sex diet strain name sizeFactor replaceable
C2 male H 2 M3HPS 0.6105888 FALSE
C3 male H 3 M6HPS 1.0305482 FALSE
C4 female H 1 F2HPS 1.2311071 FALSE
C5 female H 2 F3HPS 0.5622835 FALSE
C6 female H 3 F6HPS 1.1050964 FALSE
C7 male L 1 M2LPS 1.2206942 TRUE
C8 male L 3 M6LPS 1.3228398 TRUE
C9 male L 2 M3LPS 0.6330985 TRUE
C10 female L 1 F2LPS 1.3312426 TRUE
C11 female L 2 F3LPS 1.2465671 TRUE
C12 female L 3 F6LPS 1.8290932 TRUE
Maria_2HM male H 1 M2HPS_b 0.7746767 FALSE
Maria_2LM male L 1 M2LPS_b 0.9076435 TRUE

I'm most interested in the effect of diet, so use LRT to analyze the effect diet, full = ~ sex + strain + diet , reduced = ~ sex + strain. However analyzing each factor I know there is a difference among the strains and between sexes. The difference among the strains will allow me to generalize the effect of the diets to the population.

I have also evaluated the effect of diet*strain

full = ~ sex + strain + diet + diet*strain , reduced = ~ sex + strain + diet

my references are: Female, High diet, strain 1

here I get the following

[1,] "Intercept"
[2,] "sex_male_vs_female"
[3,] "strain_2_vs_1"
[4,] "strain_3_vs_1"
[5,] "diet_L_vs_H"
[6,] "strain2.dietL"
[7,] "strain3.dietL"

How could I get the effect of diet within strain. The effect of low diet vs high diet in strain 1? Low diet vs high diet in strain 2?Low diet vs high diet in strain 3? . I would like to see the effect of the diet in females vs males. Any help will be more than appreciated.

ADD REPLY • link 7.1 years ago Toja • 0

0

Entering edit mode

hi,

I'd recommend you speak with a local statistician to discuss the interpretation of coefficients and how to pull out your comparisons of interest. There are many possible designs to use here, and the modeling choices are up to really.

ADD REPLY • link 7.1 years ago Michael Love 41k