How to formulate 2 factor nested design in DESeq2
1
0
Entering edit mode
JK Kim ▴ 20
@jk-kim-14168
Last seen 8 months ago
United States

Hello,

I have a bulkRNAseq dataset which looks like this. The variable Litter (factor) is nested in the variable Treatment. What I would like to know is how to formulate a design for this experiment - gene expression between the drug and the control while controlling the Litter effect. I read Group-specific condition effects, individuals nested within groups ,?results from DESeq2, tutorial_deseq2_contrast, RNA-Seq workflow: gene-level exploratory analysis and differential expression and some other posts related to nested design in support.bioconductor.org but I still have no clues...

Could anyone help me how to formulate a design for the experiment, please? I am even not sure the only design I can do for this experiment is just (~ + Treatment).


meta_df <- data.frame(
  stringsAsFactors = FALSE,
                  sample_name = c("T1","T2",
                               "T3","T4","T5","T6","T7","T8","T9","C1","C2",
                               "C3","C4","C5","C6","C7","C8"),
                 Treatment = c("drug","drug",
                               "drug","drug","drug","drug","drug","drug",
                               "drug","control","control","control",
                               "control","control","control","control","control"),
                    Litter = c(1L,1L,1L,2L,
                               2L,2L,3L,3L,3L,4L,4L,4L,5L,5L,5L,6L,
                               6L)
        )

meta_df$Treatment <- factor(meta_df$Treatment, levels = c("control", "drug"))
meta_df$Litter <- factor(meta_df$Litter, levels = c(1,2,3,4,5,6))
meta_df <- meta_df %>% column_to_rownames("sample_name")

My meta data looks like this.

meta_df

    Treatment Litter
T1      drug      1
T2      drug      1
T3      drug      1
T4      drug      2
T5      drug      2
T6      drug      2
T7      drug      3
T8      drug      3
T9      drug      3
C1   control      4
C2   control      4
C3   control      4
C4   control      5
C5   control      5
C6   control      5
C7   control      6
C8   control      6

I tried this, but I am even not sure this is right...

meta_df$test <- factor(c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3))
meta_df
   Treatment Litter test
T1      drug      1    1
T2      drug      1    1
T3      drug      1    1
T4      drug      2    2
T5      drug      2    2
T6      drug      2    2
T7      drug      3    3
T8      drug      3    3
T9      drug      3    3
C1   control      4    1
C2   control      4    1
C3   control      4    1
C4   control      5    2
C5   control      5    2
C6   control      5    2
C7   control      6    3
C8   control      6    3
colData(dds) <- DataFrame(meta_df)
dds <- DESeqDataSet(dds, design = ~ + Treatment + Treatment:test)
dds <- DESeq(dds)
res <- results(dds)
resultsNames(dds)
[1] "Intercept"                 "Treatment_drug_vs_control" "Treatmentcontrol.test2"    "Treatmentdrug.test2"       "Treatmentcontrol.test3"   
[6] "Treatmentdrug.test3"
nested_design RNASeqData DESeq2 • 1.3k views
ADD COMMENT
1
Entering edit mode
jeroen.gilis ▴ 90
@jeroengilis-21551
Last seen 3 months ago
Belgium

Hi JK Kim,

What you would want is to fit a model that includes Litter as a fixed covariate in the model, ~ Litter + Treatment, or a model that additionally allows for interactions between Litter and Treatment, ~ Litter * Treatment.

However, as you might have discovered already, you will not be able to fit this model with your data. The problem is that each Litter only exists in one Treatment group (i.e., Litter 1 only appears in Treatment 1, Litter 4 in Treatment 2, ...). In other words, your Treatment effect is confounded by the Litter effect. As a consequence, it is not possible to estimate the effect of Treatment within Litter. This would however be necessary to fit the aforementioned models.

If your Litter variable would have looked your like self-defined test variable, your problem would have been solved and you could fit the aforementioned models.

So in brief, I'm afraid the experimental design of your data does not allow for an analysis that correctly estimates both the within- and between-litter error. As a result, the results of your DE analysis will inevitably be biased.

Kind regards,

Jeroen

ADD COMMENT
1
Entering edit mode

Maybe to add, one possible way of analysing your data is by using mixed models. While mixed models are not implemented in DESeq2, there are packages in Bioconductor that do implement this, like http://bioconductor.org/packages/release/bioc/html/variancePartition.html -> http://bioconductor.org/packages/release/bioc/vignettes/variancePartition/inst/doc/dream.html.

ADD REPLY
1
Entering edit mode

Thank you. Yes, the design matrix is not full rank when I do ~ + Litter + Treatment or Litter:Treatment. hmmm. I will look into the links you shared. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6