Question

Perform time course rna seq analysis at once with a single design or separate differential analysis for each time point

0

Entering edit mode

adhikb • 0

@adhikb-20158

Last seen 5.8 years ago

Hi all, I have time course data: 0, 2, 4, 6, and 8 hours. 0 hr has 6 control samples. All other time points have 6 control and 6 treated samples. Total 54 samples. (The 6 replicates are basically 6 different donors.)

Here is what I used to compute DE for different time points using contrast.

dds <- DESeqDataSetFromMatrix(countData = counts, colData = samples, design = ~group+donor)
dds <- estimateSizeFactors(dds)
nc <- counts(dds, normalized=TRUE)
filter <- rowSums(nc >= 10) >= 6  
dds <- dds[filter,]
dds<-DESeq(dds)
results(dds, contrast=c('group','T8','C8'), alpha=0.05)
summary(results)

I am filtering counts only to get rid of genes with low counts, I am assuming the independent filtering will choose genes with high power. note: One of my samples in the treated group happen to have way too many reads than other samples in the same group.

With my results, many genes that are reported as highly differentially expressed have counts that don't make sense to me (I would assume those would be filtered out). For example, few of the genes reported as highly differentially expressed are given in this link: genes with read counts

Below are my concerns:

Should these genes not be filtered out? I am guessing the reason they are being reported as DE is due to the fact that these genes have very high count for 1 sample in the treated group and it increases the average expression of that gene in the treated group??
How should I filter my data such that genes like these are not reported as differentially expressed? If I were comparing only 2 groups with a design for 1 particular time point (control 8hr vs trt 8hr), I would filter genes such that at least 6 of my samples out of 12 express at least 10 read counts or 1 cpm (this will not report those genes that I saw with my analysis). But with a design that has all time points, what should I do to exclude genes such as reported above?
For this time course data, I am particularly interested in differences between two groups at each time point. So, should I still compute DE using a single model -a design including all 54 samples together? or is it just fine to compute DEs separately using 4 different models at different time points? Are there any pros and cons either ways?

Thanks!

deseq2 design • 1.3k views

ADD COMMENT • link updated 6.7 years ago by Michael Love 43k • written 6.7 years ago by adhikb • 0

score 0 · Answer 1 · 2019-06-13

Can you try a few things?

One approach would be to use a design of ~donor + time + time:treatment. Where time is a factor with levels 0-8, and treatment is a factor as well. You have to do a little manipulation of the model matrix though. If you do:

mm <- model.matrix(design(dds), colData(dds)

You'll see there is a column of all 0's for the treatment at time 0 (no such samples exist). You can remove this column, and then supply the fixed mm to the full argument, and supply a reduced model matrix to reduced. For example, if you want to test T8 vs C8, you would remove the column which gives the interaction of time 8 and treatment.

dds <- DESeq(dds, test="LRT", full=mm, reduced=mm[,-idx])

This is my preferred approach for this design.

Another thing I'm curious about: While in theory the size factor normalization should take care of the one sample that has an order of magnitude more reads than the others (or even more?), can you try downsampling that sample to the depth of the others? I'm curious if that is actually presenting a problem here with your particular design. You can do something like this, to downsample sample 7 by 1/10.

j <- 7
counts(dds)[,j] <- rpois(nrow(dds), lambda=counts(dds)[,j] / 10)

I'm curious if this fixes your above design and the Wald test, which should be fine for doing the pairwise comparisons.