RNA-Seq Time series analysis in EdgeR
1
1
Entering edit mode
josmantorres ▴ 10
@josmantorres-23988
Last seen 18 months ago
Argentina

Hello, I have an RNA-Seq experiment from insect antennae across different data points (0,2,4,6 and 8 days) with 7 replicates per data point (except t6 with 6 replicates). I am analyzing using EdgeR to identify differential expressed genes. I have used the following EdgeR script in my analysis: 1) I created the DGEList object, data normalization step and made the design matrix of my experiment

samples_matrix
group <- factor(c('t0','t0','t0','t0','t0','t0','t0','t2','t2','t2','t2','t2','t2','t2','t4','t4','t4','t4','t4','t4','t4','t6','t6','t6','t6','t6','t6','t8','t8','t8','t8','t8','t8','t8'))
y <- DGEList(counts=samples_matrix, group=group)
y <- calcNormFactors(y)
design <- model.matrix(~0+group, data=y$samples) colnames(design) <- levels(y$samples\$group)


The design matrix is composed by 1 and 0 and it has a total of 5 columns (with the different time points) and 34 rows with the samples.

2) I estimated the dispersion and fitted to Quasilikehood ratio F model

y <- estimateDisp(y,design)
fit <- glmQLFit(y,design, robust=TRUE)


3) I defined my contrasts. The first idea was to identify all genes that change in the antennae at any time point. For this reason, we defined a total of 8 contrasts: 4 sequential (across time points) and 4 grouped. The next step will be to perform pairwise comparisons to identify specific changes between time points (not included in the post).

my.contrasts <- makeContrasts(t2vst0=t2-t0, t4vst2=t4-t2, t6vst4=t6-t4, t8vst6=t8-t6, allvst0=(t2+t4+t6+t8)/4-t0, allvst02=(t4+t6+t8)/3-(t0+t2)/2, allvst024=(t6+t8)/2-(t0+t2+t4)/3, allvs8=t8-(t0+t2+t4+t6)/4, levels=design)


4) I performed the QLMFtest, filtered using HTSfilter package (script not included in this post) and topTags function.

res <- glmQLFTest(fit, contrast=my.contrasts)
res_est <- topTags(res_filtered, n=Inf,  adjust.method="BH", sort.by="PValue")


Res_est object contains logFc value for each comparison specified in contrasts and unique values of logCPM, F, PValue and FDR for each gene However, the information in rest_est objects was "Coefficient: LR test on 4 degrees of freedom", when the number of comparisons is 10 and then the degrees of freedom has to be different. I do not know if I have something wrong in my script or I missed some step in my analysis. Even, I deleted some comparisons from the analysis (at step 3) and the differential expressed genes identified (step 4) remained the same (5066 not sig and 6120 were sig) and statistical values (P value and FDR) did not change.

Thanks a lot for your help,

Jose

edger Time series analysis • 575 views
1
Entering edit mode
@gordon-smyth
Last seen 27 minutes ago
WEHI, Melbourne, Australia

There's nothing wrong. You have 5 time points so there are only 4 df for differences between the time points. Hence any set of 4 or more non-redundant contrasts will lead to the same F test on 4 df.

0
Entering edit mode

Thanks Gordon for your quick answer! I thought that the problem was the pipeline and in deed it was my math.... Thanks again, Jose