I'm analyzing 36 RNAseq samples in order to obtain differentially expressed genes. The study design is a single time course with 12 timepoints ("t1" to "t12") and 3 replicates per timepoint (A, B and C)
First of all I want to confirm the strategy I'm following it's the correct one. I'm using edgeR-glm for doing the comparisons 't2-t1', 't3-t2', 't4-t3', 't5-t4' and so on. What I pretend here is to obtain the DE genes between each time. I know about the existence of packages prepared to do time course analyses, such as 'maSigPro'. However I had a problem running this analyses and I think it's due to the large amount of timepoints. Specifically, the command that fails is:
design <- make.design.matrix(eDesign, degree = 2) min.obs <- (2+1)*(ncol(eDesign)-2)+1 #command that fails: fit <- p.vector(data = counts_norm, design = design, Q = 0.05, counts = TRUE, min.obs = min.obs)
Here a little snippet of the 'eDesign' matrix:<caption>eDesign matrix</caption>
I have tried up to
degree = 5, and still failing. I think the problem is the timepoints number (12) because if I group them in only 5 the command runs smoothly.
That's the major cause I'm doing this pairwise comparisons between times ('t2-t1', 't3-t2', 't4-t3'...), cause I can't directly use maSigPro with the 12 timepoints. Moreover, with the edger-glm pairwise approach I can know exactly what genes are DE between each paire of times. Haven't seen this is possible with maSigPro.
For this 5 timepoint grouping I have been based on the plotMDS results and the DE analyses between times. And here comes my next question:
- From what data should MDS plots be done: raw counts or normalized counts? I would say normalized data cause they take into account lib. sizes. The point is I don't obtain the same results when running next commands:
# First one DGE <- calcNormFactors(DGE, method = "TMM") plotMDS(DGE)
# Second one DGE_norm <- tmm(DGE$counts, long = 1000, lc = 0, k = 0) plotMDS(DGE_norm)
calcNormFactors() doesn't actually normalize the counts, just calculates the normalization factor. However in the edgeR manual (https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf), sections 4.1.4 and 4.1.5, the pass to limma::plotMDS() this normalized DGElist, such as the first comand above. Doing that I obtain the same MDS plot that with the raw counts (the raw DGElist without doing calcNormFactors()). The second command does generate a different plot. I could attach the plots if someone ask for them.
I'm interested in knowing how my samples group because I need to reduce the timepoints from 12 to 4-6 in order to use maSigPro.