DESeq2 - nested samples (individuals) - time course - conditions
Entering edit mode
Last seen 2.9 years ago


I am using DESeq2, but unsure whether or not I am using the right design. I could not find here anything similar.

subject: A, B, C, D, E, F cohort: ctrl, treatment time: 0, 1

As you can see subjects belong to either cohort (ctrl or treatment) and each subject is sampled at multiple time points.

Is there a way in DESeq2 to "block" for subjects? In my case (my data) base counts can vary substantially among subjects, and I am trying to find genes that behave differently in time, depending on the treatment (cohort).

Q1: Would this design work for me?

design(dds) <- ~ time + cohort + time:cohort
dds <- DESeq(dds, test="LRT", reduced = ~ time)

Q2: How can I implement the "blocking" of subjects? If you are going to suggest duplicateCorr from voom/limma, at what point of the DESeq2 analysis should I do that?

Q3: Do I really need to exclude samples when cohorts are unbalanced (e.g. 30 subjects vs 24 ?) It would be better if DESeq2 could do this, for instance by randomly excluding 6 subjects and re-running this analysis multiple times, rather than me manually discarding data from 6 samples... not sure if possible..?

extra Q : Would I also be able to find genes going up/down the same direction in both cohorts, when the slope is higher in one cohort than in the other? would those also be detected as DE?

Thank you for taking the time! Dany

deseq2 nested de • 537 views
Entering edit mode
Last seen 2 hours ago
United States

This is answered in the vignette, see “individual nested within groups”.

Entering edit mode

Hi Michael,

I did try to apply the example from the vignette, but my output is not making sense when I plot it. For instance, I am getting genes that are DE at the same time point, while I am looking for genes that behave different in time between cohorts. This is my code so far:

m3 <- model.matrix(~ cohort + cohort:ind.n + cohort:date, new_coldata)
m4 <- model.matrix(~ date, new_coldata)

dds3 <- DESeqDataSetFromMatrix(countData = dds2,
                          colData = new_coldata,
                          design = m3)

dds3$cohort <- relevel(dds3$cohort, ref = "Control")

cts <- counts(dds3)
geoMeans2 <- apply(cts, 1, function(row) if (all(row == 0)) 0 else exp(sum(log(row[row != 0]))/length(row)))
dds3 <- estimateSizeFactors(dds3, geoMeans=geoMeans2)
eff <- DESeq(dds, "LRT", full = m1, reduced = m2)

this is what I get with resultsNames(eff): [1] "Intercept" "cohortNeomycin" "cohortControl.ind.n2" "cohortNeomycin.ind.n2" [5] "cohortControl.ind.n3" "cohortNeomycin.ind.n3" "cohortControl.ind.n4" "cohortNeomycin.ind.n4" [9] "cohortControl.ind.n5" "cohortNeomycin.ind.n5" "cohortControl.ind.n6" "cohortNeomycin.ind.n6" [13] "cohortControl.ind.n7" "cohortNeomycin.ind.n7" "cohortControl.ind.n8" "cohortNeomycin.ind.n8" [17] "cohortControl.ind.n9" "cohortNeomycin.ind.n9" "cohortControl.ind.n10" "cohortNeomycin.ind.n10" [21] "cohortControl.ind.n11" "cohortNeomycin.ind.n11" "cohortControl.ind.n12" "cohortNeomycin.ind.n12" [25] "cohortControl.ind.n13" "cohortNeomycin.ind.n13" "cohortControl.ind.n14" "cohortNeomycin.ind.n14" [29] "cohortControl.ind.n15" "cohortNeomycin.ind.n15" "cohortControl.ind.n16" "cohortNeomycin.ind.n16" [33] "cohortControl.ind.n17" "cohortNeomycin.ind.n17" "cohortControl.ind.n18" "cohortNeomycin.ind.n18" [37] "cohortControl.ind.n19" "cohortNeomycin.ind.n19" "cohortControl.ind.n20" "cohortNeomycin.ind.n20" [41] "cohortControl.ind.n21" "cohortNeomycin.ind.n21" "cohortControl.ind.n22" "cohortNeomycin.ind.n22" [45] "cohortControl.ind.n23" "cohortNeomycin.ind.n23" "cohortControl.datet2" "cohortNeomycin.datet2"

eff3 <- DESeq(dds3, "LRT", full = m3, reduced = m4)

Could you please tell me if the analysis above is correct to the scope of finding DE genes between cohorts? Then at least I would know if the problem is the setup or if it lies within my data.

thank you

Entering edit mode

I would recommend speaking to a statistician about how to formulate your hypotheses.

Above the LRT you are using (also read the vignette section on the LRT), is asking for which genes are the cohort, individual, or cohort-specific date effects not equal to zero. I'm sure that's actually a lot of genes, and not the ones you are interested in. This is a really important part of the analysis, maybe more important than e.g. how the sequencing was done, so it's worth a conversation with a statistician or someone who is familiar with linear models and testing.


Login before adding your answer.

Traffic: 377 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6