I have rna-seq data with three facors : treated , time, temperature
I want to get differential expressed genes for the treatment, but the design should also account for the
temperature and time.
I have created the following setup:
d <- DGEList(counts = countTable)
#~ #normalization must be done before GLM analysis.
dge <- calcNormFactors(d)
design <- model.matrix(~0+treated+time+temperature)
dge <- estimateGLMCommonDisp(dge, design)
dge <- estimateGLMTagwiseDisp(dge, design)
glmfit.dge <- glmFit(dge, design,dispersion=dge$common.dispersion)
lrt.dge <- glmLRT(glmfit.dge, coef="treated")
result <- topTags(lrt.dge, adjust.method="BH", sort.by="logFC")
I have some doubts about the design model. I have not had any experience with design models, thus i am unsure if what i am doing is correct.
Could someone explain why what i am doing here is right or wrong. And do i have to use 0+ in the formula? If i leave it out it creates an intersect, what does this do ?
That's better. As for the first question, no, the commands aren't obsolete. You still need to estimate the dispersion in order to do the testing. Actually, you could just make life easier for yourself and do:
For the second question, if you change the design matrix to not have the intercept, then the step:
... will be correct if you want to detect differences due to treatment. This would do the same thing as using a design matrix without an intercept and then calling
makeContrasts
, as I've described above. The only difference between these two strategies is in ease of use; for a design matrix without an intercept, the call toglmLRT
is simpler but the interpretation of the coefficients is harder.