RNA-Seq Time series analysis in EdgeR
Entering edit mode
josmantorres ▴ 10
Last seen 2.7 years ago

Hello, I have an RNA-Seq experiment from insect antennae across different data points (0,2,4,6 and 8 days) with 7 replicates per data point (except t6 with 6 replicates). I am analyzing using EdgeR to identify differential expressed genes. I have used the following EdgeR script in my analysis: 1) I created the DGEList object, data normalization step and made the design matrix of my experiment

group <- factor(c('t0','t0','t0','t0','t0','t0','t0','t2','t2','t2','t2','t2','t2','t2','t4','t4','t4','t4','t4','t4','t4','t6','t6','t6','t6','t6','t6','t8','t8','t8','t8','t8','t8','t8'))
y <- DGEList(counts=samples_matrix, group=group)
y <- calcNormFactors(y)
design <- model.matrix(~0+group, data=y$samples)
colnames(design) <- levels(y$samples$group)

The design matrix is composed by 1 and 0 and it has a total of 5 columns (with the different time points) and 34 rows with the samples.

2) I estimated the dispersion and fitted to Quasilikehood ratio F model

y <- estimateDisp(y,design)
fit <- glmQLFit(y,design, robust=TRUE)

3) I defined my contrasts. The first idea was to identify all genes that change in the antennae at any time point. For this reason, we defined a total of 8 contrasts: 4 sequential (across time points) and 4 grouped. The next step will be to perform pairwise comparisons to identify specific changes between time points (not included in the post).

my.contrasts <- makeContrasts(t2vst0=t2-t0, t4vst2=t4-t2, t6vst4=t6-t4, t8vst6=t8-t6, allvst0=(t2+t4+t6+t8)/4-t0, allvst02=(t4+t6+t8)/3-(t0+t2)/2, allvst024=(t6+t8)/2-(t0+t2+t4)/3, allvs8=t8-(t0+t2+t4+t6)/4, levels=design)

4) I performed the QLMFtest, filtered using HTSfilter package (script not included in this post) and topTags function.

res <- glmQLFTest(fit, contrast=my.contrasts)
res_est <- topTags(res_filtered, n=Inf,  adjust.method="BH", sort.by="PValue")

Res_est object contains logFc value for each comparison specified in contrasts and unique values of logCPM, F, PValue and FDR for each gene However, the information in rest_est objects was "Coefficient: LR test on 4 degrees of freedom", when the number of comparisons is 10 and then the degrees of freedom has to be different. I do not know if I have something wrong in my script or I missed some step in my analysis. Even, I deleted some comparisons from the analysis (at step 3) and the differential expressed genes identified (step 4) remained the same (5066 not sig and 6120 were sig) and statistical values (P value and FDR) did not change.

Thanks a lot for your help,


edger Time series analysis • 1.2k views
Entering edit mode
Last seen 1 hour ago
WEHI, Melbourne, Australia

There's nothing wrong. You have 5 time points so there are only 4 df for differences between the time points. Hence any set of 4 or more non-redundant contrasts will lead to the same F test on 4 df.

Entering edit mode

Thanks Gordon for your quick answer! I thought that the problem was the pipeline and in deed it was my math.... Thanks again, Jose


Login before adding your answer.

Traffic: 848 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6