I have RNA-seq dataset with 3 factors:
- Treatment: control , treatment (2 levels)
- Cell lines: ABC.1, Colo320DM, HCT.8, OU31, OVCAR4, RKO (6 levels)
- Groups: I, II (2 levels, so that cell lines ABC.1, Colo320DM, OVCAR4, OU31 belong to group I and RKO, HCT.8 to group II).
I would like to find out following:
- List of genes which respond to treatment within groups I and II
- Genes (biomarkers) unique to each group (group I vs group II).
Which design formula would be appropriate for this case?
RNA-seq workflow at Bioconductor suggests to use
~ group + treatment + group:treatment
I have tried it and it does not seem to work, there are no results produced. I would like to add cell lines to the formula, so that e.g. cell line A untreated is compared to treated cell line A, but not to the bulk of cell lines belonging to one group.
Is there a way of doing it in DESEq2?
(A quick note on the support site: there are two ways to reply: Comments or Answers. If you want to start a threaded discussion you can use Comment. If you have an answer for the original post, you would use Answer.)
Where did the other 3 cell lines go? You said there were 9, but I only see 6.
For your aim to find "genes (biomarkers) unique to each group (group I vs group II)" do you want to find genes which have different response to treatment across group? Or just different baseline (control) expression across group? Or something else?
Hi, thanks for pointing out that there are 6 cell lines, for this analysis I have removed 3 cell lines which are known to be "partial responders" and can affect the analysis (I have corrected the original question).
I would like to find genes for which the change in regulation differs between responders and non-responders (across groups). For instance, genes that are upregulated in one group and donwregulated in another. I would like DEseq2 first to compare treated and untreated samples within cell lines and then compare responders vs non-responders.
It seems to me that it has to be done in a way, which is described in section 3.12.1 Linear combinations of DEseq2 vignette, where it is suggested to add ind.n row to colData, but I am not sure about the right way of doing this.
It just takes a little management, because the interaction term between group (responders) and cell.line produces some columns which are all zero and must be removed manually.
You should add a new column cell.line.nested to colData which has levels A,B,C,D for responders and levels A,B for non-responders. Now create a matrix with:
mm <- model.matrix(~ group + group:cell.line.nested + group:treatment, colData(dds))
Then go in and remove the columns with all zeros (see vignette section "Levels without samples" immediately following the linear combinations section).
You will then provide this modified model matrix to the 'full' argument:
You should be able to test the individual treatment effects in responders and non-responders using the 'name' argument, and you can contrast them using the list-style of the 'contrast' argument. And this controls for cell line effects, so you are comparing the treatment effects within cell lines.
HI Michael, I'm working with a similar design where I have a model with 3 factors, two fixed factors, SEX (M, F) and DIET (Low and high). Then the random factor that is Strain 1, 2, 3.
I'm using DESeq2_1.12.4
> as.data.frame(colData(dds))
sex diet strain name sizeFactor replaceable
C2 male H 2 M3HPS 0.6105888 FALSE
C3 male H 3 M6HPS 1.0305482 FALSE
C4 female H 1 F2HPS 1.2311071 FALSE
C5 female H 2 F3HPS 0.5622835 FALSE
C6 female H 3 F6HPS 1.1050964 FALSE
C7 male L 1 M2LPS 1.2206942 TRUE
C8 male L 3 M6LPS 1.3228398 TRUE
C9 male L 2 M3LPS 0.6330985 TRUE
C10 female L 1 F2LPS 1.3312426 TRUE
C11 female L 2 F3LPS 1.2465671 TRUE
C12 female L 3 F6LPS 1.8290932 TRUE
Maria_2HM male H 1 M2HPS_b 0.7746767 FALSE
Maria_2LM male L 1 M2LPS_b 0.9076435 TRUE
I'm most interested in the effect of diet, so use LRT to analyze the effect diet, full = ~ sex + strain + diet , reduced = ~ sex + strain. However analyzing each factor I know there is a difference among the strains and between sexes. The difference among the strains will allow me to generalize the effect of the diets to the population.
I have also evaluated the effect of diet*strain
full = ~ sex + strain + diet + diet*strain , reduced = ~ sex + strain + diet
my references are: Female, High diet, strain 1
here I get the following
[1,] "Intercept"
[2,] "sex_male_vs_female"
[3,] "strain_2_vs_1"
[4,] "strain_3_vs_1"
[5,] "diet_L_vs_H"
[6,] "strain2.dietL"
[7,] "strain3.dietL"
How could I get the effect of diet within strain. The effect of low diet vs high diet in strain 1? Low diet vs high diet in strain 2?Low diet vs high diet in strain 3? . I would like to see the effect of the diet in females vs males. Any help will be more than appreciated.
hi,
I'd recommend you speak with a local statistician to discuss the interpretation of coefficients and how to pull out your comparisons of interest. There are many possible designs to use here, and the modeling choices are up to really.