Question

How to formulate 2 factor nested design in DESeq2

0

Entering edit mode

JK Kim ▴ 20

@jk-kim-14168

Last seen 8 months ago

United States

Hello,

I have a bulkRNAseq dataset which looks like this. The variable Litter (factor) is nested in the variable Treatment. What I would like to know is how to formulate a design for this experiment - gene expression between the drug and the control while controlling the Litter effect. I read Group-specific condition effects, individuals nested within groups ,?results from DESeq2, tutorial_deseq2_contrast, RNA-Seq workflow: gene-level exploratory analysis and differential expression and some other posts related to nested design in support.bioconductor.org but I still have no clues...

Could anyone help me how to formulate a design for the experiment, please? I am even not sure the only design I can do for this experiment is just (~ + Treatment).


meta_df <- data.frame(
  stringsAsFactors = FALSE,
                  sample_name = c("T1","T2",
                               "T3","T4","T5","T6","T7","T8","T9","C1","C2",
                               "C3","C4","C5","C6","C7","C8"),
                 Treatment = c("drug","drug",
                               "drug","drug","drug","drug","drug","drug",
                               "drug","control","control","control",
                               "control","control","control","control","control"),
                    Litter = c(1L,1L,1L,2L,
                               2L,2L,3L,3L,3L,4L,4L,4L,5L,5L,5L,6L,
                               6L)
        )

meta_df$Treatment <- factor(meta_df$Treatment, levels = c("control", "drug"))
meta_df$Litter <- factor(meta_df$Litter, levels = c(1,2,3,4,5,6))
meta_df <- meta_df %>% column_to_rownames("sample_name")

My meta data looks like this.

meta_df

    Treatment Litter
T1      drug      1
T2      drug      1
T3      drug      1
T4      drug      2
T5      drug      2
T6      drug      2
T7      drug      3
T8      drug      3
T9      drug      3
C1   control      4
C2   control      4
C3   control      4
C4   control      5
C5   control      5
C6   control      5
C7   control      6
C8   control      6

I tried this, but I am even not sure this is right...

meta_df$test <- factor(c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3))
meta_df

   Treatment Litter test
T1      drug      1    1
T2      drug      1    1
T3      drug      1    1
T4      drug      2    2
T5      drug      2    2
T6      drug      2    2
T7      drug      3    3
T8      drug      3    3
T9      drug      3    3
C1   control      4    1
C2   control      4    1
C3   control      4    1
C4   control      5    2
C5   control      5    2
C6   control      5    2
C7   control      6    3
C8   control      6    3

colData(dds) <- DataFrame(meta_df)
dds <- DESeqDataSet(dds, design = ~ + Treatment + Treatment:test)
dds <- DESeq(dds)
res <- results(dds)
resultsNames(dds)

[1] "Intercept"                 "Treatment_drug_vs_control" "Treatmentcontrol.test2"    "Treatmentdrug.test2"       "Treatmentcontrol.test3"   
[6] "Treatmentdrug.test3"

nested_design RNASeqData DESeq2 • 1.3k views

ADD COMMENT • link 22 months ago JK Kim ▴ 20

score 1 · Answer 1 · 2022-06-01

Hi JK Kim,

What you would want is to fit a model that includes Litter as a fixed covariate in the model, ~ Litter + Treatment, or a model that additionally allows for interactions between Litter and Treatment, ~ Litter * Treatment.

However, as you might have discovered already, you will not be able to fit this model with your data. The problem is that each Litter only exists in one Treatment group (i.e., Litter 1 only appears in Treatment 1, Litter 4 in Treatment 2, ...). In other words, your Treatment effect is confounded by the Litter effect. As a consequence, it is not possible to estimate the effect of Treatment within Litter. This would however be necessary to fit the aforementioned models.

If your Litter variable would have looked your like self-defined test variable, your problem would have been solved and you could fit the aforementioned models.

So in brief, I'm afraid the experimental design of your data does not allow for an analysis that correctly estimates both the within- and between-litter error. As a result, the results of your DE analysis will inevitably be biased.

Kind regards,

Jeroen