Question: DESeq2 - nested samples (individuals) - time course - conditions
0
gravatar for gaio.transposon
22 days ago by
gaio.transposon0 wrote:

Hi,

I am using DESeq2, but unsure whether or not I am using the right design. I could not find here anything similar.

subject: A, B, C, D, E, F cohort: ctrl, treatment time: 0, 1

As you can see subjects belong to either cohort (ctrl or treatment) and each subject is sampled at multiple time points.

Is there a way in DESeq2 to "block" for subjects? In my case (my data) base counts can vary substantially among subjects, and I am trying to find genes that behave differently in time, depending on the treatment (cohort).

Q1: Would this design work for me?

design(dds) <- ~ time + cohort + time:cohort
dds <- DESeq(dds, test="LRT", reduced = ~ time)

Q2: How can I implement the "blocking" of subjects? If you are going to suggest duplicateCorr from voom/limma, at what point of the DESeq2 analysis should I do that?

Q3: Do I really need to exclude samples when cohorts are unbalanced (e.g. 30 subjects vs 24 ?) It would be better if DESeq2 could do this, for instance by randomly excluding 6 subjects and re-running this analysis multiple times, rather than me manually discarding data from 6 samples... not sure if possible..?

extra Q : Would I also be able to find genes going up/down the same direction in both cohorts, when the slope is higher in one cohort than in the other? would those also be detected as DE?

Thank you for taking the time! Dany

deseq2 nested de • 94 views
ADD COMMENTlink modified 22 days ago by Michael Love26k • written 22 days ago by gaio.transposon0
Answer: DESeq2 - nested samples (individuals) - time course - conditions
0
gravatar for Michael Love
22 days ago by
Michael Love26k
United States
Michael Love26k wrote:

This is answered in the vignette, see “individual nested within groups”.

ADD COMMENTlink written 22 days ago by Michael Love26k

Hi Michael,

I did try to apply the example from the vignette, but my output is not making sense when I plot it. For instance, I am getting genes that are DE at the same time point, while I am looking for genes that behave different in time between cohorts. This is my code so far:

m3 <- model.matrix(~ cohort + cohort:ind.n + cohort:date, new_coldata)
m4 <- model.matrix(~ date, new_coldata)

dds3 <- DESeqDataSetFromMatrix(countData = dds2,
                          colData = new_coldata,
                          design = m3)

dds3$cohort <- relevel(dds3$cohort, ref = "Control")

cts <- counts(dds3)
geoMeans2 <- apply(cts, 1, function(row) if (all(row == 0)) 0 else exp(sum(log(row[row != 0]))/length(row)))
dds3 <- estimateSizeFactors(dds3, geoMeans=geoMeans2)
eff <- DESeq(dds, "LRT", full = m1, reduced = m2)

this is what I get with resultsNames(eff): [1] "Intercept" "cohortNeomycin" "cohortControl.ind.n2" "cohortNeomycin.ind.n2" [5] "cohortControl.ind.n3" "cohortNeomycin.ind.n3" "cohortControl.ind.n4" "cohortNeomycin.ind.n4" [9] "cohortControl.ind.n5" "cohortNeomycin.ind.n5" "cohortControl.ind.n6" "cohortNeomycin.ind.n6" [13] "cohortControl.ind.n7" "cohortNeomycin.ind.n7" "cohortControl.ind.n8" "cohortNeomycin.ind.n8" [17] "cohortControl.ind.n9" "cohortNeomycin.ind.n9" "cohortControl.ind.n10" "cohortNeomycin.ind.n10" [21] "cohortControl.ind.n11" "cohortNeomycin.ind.n11" "cohortControl.ind.n12" "cohortNeomycin.ind.n12" [25] "cohortControl.ind.n13" "cohortNeomycin.ind.n13" "cohortControl.ind.n14" "cohortNeomycin.ind.n14" [29] "cohortControl.ind.n15" "cohortNeomycin.ind.n15" "cohortControl.ind.n16" "cohortNeomycin.ind.n16" [33] "cohortControl.ind.n17" "cohortNeomycin.ind.n17" "cohortControl.ind.n18" "cohortNeomycin.ind.n18" [37] "cohortControl.ind.n19" "cohortNeomycin.ind.n19" "cohortControl.ind.n20" "cohortNeomycin.ind.n20" [41] "cohortControl.ind.n21" "cohortNeomycin.ind.n21" "cohortControl.ind.n22" "cohortNeomycin.ind.n22" [45] "cohortControl.ind.n23" "cohortNeomycin.ind.n23" "cohortControl.datet2" "cohortNeomycin.datet2"

eff3 <- DESeq(dds3, "LRT", full = m3, reduced = m4)

Could you please tell me if the analysis above is correct to the scope of finding DE genes between cohorts? Then at least I would know if the problem is the setup or if it lies within my data.

thank you

ADD REPLYlink written 21 days ago by gaio.transposon0

I would recommend speaking to a statistician about how to formulate your hypotheses.

Above the LRT you are using (also read the vignette section on the LRT), is asking for which genes are the cohort, individual, or cohort-specific date effects not equal to zero. I'm sure that's actually a lot of genes, and not the ones you are interested in. This is a really important part of the analysis, maybe more important than e.g. how the sequencing was done, so it's worth a conversation with a statistician or someone who is familiar with linear models and testing.

ADD REPLYlink written 21 days ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 139 users visited in the last hour