Hello, I have an RNA-Seq experiment from insect antennae across different data points (0,2,4,6 and 8 days) with 7 replicates per data point (except t6 with 6 replicates). I am analyzing using EdgeR to identify differential expressed genes. I have used the following EdgeR script in my analysis: 1) I created the DGEList object, data normalization step and made the design matrix of my experiment
samples_matrix
group <- factor(c('t0','t0','t0','t0','t0','t0','t0','t2','t2','t2','t2','t2','t2','t2','t4','t4','t4','t4','t4','t4','t4','t6','t6','t6','t6','t6','t6','t8','t8','t8','t8','t8','t8','t8'))
y <- DGEList(counts=samples_matrix, group=group)
y <- calcNormFactors(y)
design <- model.matrix(~0+group, data=y$samples)
colnames(design) <- levels(y$samples$group)
The design matrix is composed by 1 and 0 and it has a total of 5 columns (with the different time points) and 34 rows with the samples.
2) I estimated the dispersion and fitted to Quasilikehood ratio F model
y <- estimateDisp(y,design)
fit <- glmQLFit(y,design, robust=TRUE)
3) I defined my contrasts. The first idea was to identify all genes that change in the antennae at any time point. For this reason, we defined a total of 8 contrasts: 4 sequential (across time points) and 4 grouped. The next step will be to perform pairwise comparisons to identify specific changes between time points (not included in the post).
my.contrasts <- makeContrasts(t2vst0=t2-t0, t4vst2=t4-t2, t6vst4=t6-t4, t8vst6=t8-t6, allvst0=(t2+t4+t6+t8)/4-t0, allvst02=(t4+t6+t8)/3-(t0+t2)/2, allvst024=(t6+t8)/2-(t0+t2+t4)/3, allvs8=t8-(t0+t2+t4+t6)/4, levels=design)
4) I performed the QLMFtest, filtered using HTSfilter package (script not included in this post) and topTags function.
res <- glmQLFTest(fit, contrast=my.contrasts)
res_est <- topTags(res_filtered, n=Inf, adjust.method="BH", sort.by="PValue")
Res_est
object contains logFc value for each comparison specified in contrasts and unique values of logCPM, F, PValue and FDR for each gene However, the information in rest_est
objects was "Coefficient: LR test on 4 degrees of freedom", when the number of comparisons is 10 and then the degrees of freedom has to be different. I do not know if I have something wrong in my script or I missed some step in my analysis. Even, I deleted some comparisons from the analysis (at step 3) and the differential expressed genes identified (step 4) remained the same (5066 not sig and 6120 were sig) and statistical values (P value and FDR) did not change.
Thanks a lot for your help,
Jose
Thanks Gordon for your quick answer! I thought that the problem was the pipeline and in deed it was my math.... Thanks again, Jose