Hi,
I'm analyzing 36 RNAseq samples in order to obtain differentially expressed genes. The study design is a single time course with 12 timepoints ("t1" to "t12") and 3 replicates per timepoint (A, B and C)
First of all I want to confirm the strategy I'm following it's the correct one. I'm using edgeR-glm for doing the comparisons 't2-t1', 't3-t2', 't4-t3', 't5-t4' and so on. What I pretend here is to obtain the DE genes between each time. I know about the existence of packages prepared to do time course analyses, such as 'maSigPro'. However I had a problem running this analyses and I think it's due to the large amount of timepoints. Specifically, the command that fails is:
design <- make.design.matrix(eDesign, degree = 2) min.obs <- (2+1)*(ncol(eDesign)-2)+1 #command that fails: fit <- p.vector(data = counts_norm, design = design, Q = 0.05, counts = TRUE, min.obs = min.obs)
Here a little snippet of the 'eDesign' matrix:
<caption>eDesign matrix</caption>Time | Replicates | Group | |
---|---|---|---|
A1 | 1 | 1 | 1 |
B1 | 1 | 1 | 1 |
C1 | 1 | 1 | 1 |
A2 | 2 | 2 | 1 |
B2 | 2 | 2 | 1 |
C2 | 2 | 2 | 1 |
... | ... | ... | ... |
A12 | 12 | 12 | 1 |
B12 | 12 | 12 | 1 |
C12 | 12 | 12 | 1 |
I have tried up to degree = 5
, and still failing. I think the problem is the timepoints number (12) because if I group them in only 5 the command runs smoothly.
That's the major cause I'm doing this pairwise comparisons between times ('t2-t1', 't3-t2', 't4-t3'...), cause I can't directly use maSigPro with the 12 timepoints. Moreover, with the edger-glm pairwise approach I can know exactly what genes are DE between each paire of times. Haven't seen this is possible with maSigPro.
For this 5 timepoint grouping I have been based on the plotMDS results and the DE analyses between times. And here comes my next question:
- From what data should MDS plots be done: raw counts or normalized counts? I would say normalized data cause they take into account lib. sizes. The point is I don't obtain the same results when running next commands:
# First one DGE <- calcNormFactors(DGE, method = "TMM") plotMDS(DGE)
# Second one DGE_norm <- tmm(DGE$counts, long = 1000, lc = 0, k = 0) plotMDS(DGE_norm)
I know calcNormFactors()
doesn't actually normalize the counts, just calculates the normalization factor. However in the edgeR manual (https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf), sections 4.1.4 and 4.1.5, the pass to limma::plotMDS() this normalized DGElist, such as the first comand above. Doing that I obtain the same MDS plot that with the raw counts (the raw DGElist without doing calcNormFactors()). The second command does generate a different plot. I could attach the plots if someone ask for them.
I'm interested in knowing how my samples group because I need to reduce the timepoints from 12 to 4-6 in order to use maSigPro.
Many thanks,
Dani
What exactly is the tmm() function that you are using? There is no such function in any of the packages you say that you are using (limma, edgeR or maSigPro).
I am guessing that tmm() is actually a function from the NOISeq package. Why are you using it? It doesn't produce a DGEList object and it therefore will give quite different results in downstream analysis as compared to calcNormFactors(). You would do better to stick to the regular TMM method that is implemented in the edgeR package.
I think that NOISeq::tmm is probably a sort of approximation to cpm() in the edgeR package, but the help page for tmm() doesn't document what the output is so I don't know for sure.