Dear Limma/EdgeR/Voom developers/users,
I am working on a developmental RNA-Seq data, and would like to fit time-dependent gene transcription profiles to the quadratic function, similar to the way it is done in the Limma Voom article, Law et al. 2014 (Genome biology), for drosophila developmental samples.
My experimental setting is in fact more complex, since it is based on paired design
(3 different 'individuals' are tested, or more correctly cultures, at different time points, and this is repeated for different treatments) ...
would be grateful if you can explain which commands should be used for first performing similar fit as shown in this article, and then expanding it to consider paired design ?
thanks in advance
Assaf
You should separate your tags to
limma
andvoom
, otherwise people won't see this question. Also, give exact details of your experimental design if you want to get useful advice. Otherwise, you'll just be stuck with vague generalities.Hi Aaron,
So, the design is partitioned to 3 layers:
1) genes count measurements taken from a single individual at x time points (currently x= 0 4 12 18 21 24 48 hours).
2) the above experiment is repeated n times, for n different individuals, to give n*x samples
3) the above is repeated 2 times for control, and treatment, to give 2*n*x samples
At this point I have only the sequencing results of a pilot experiment with x=7 samples from a single individual. Not sure the blow code for the pilot experiment is correct. Below code is based on the spline fit example in Limma manual, for order=3, assuming this can represent quadratic trends (is it correct?).
Also: * unclear to me what is the meaning of the topTable() coefficients in this context (since there are no explicit definitions of groups of comparisons). * unclear to me the meaning of the design matrix that was produced, which is very different from the standard EdgeR formats I know (please see below)
thanks in advance for your time and help, (* and thanks for valuable help some time ago)
Assaf
filtered_Counts
t0 t4 t12 t18 t21 t24 t48
c10098_g2 0 0 62 58 46 2 27
c101026_g1 15 43 37 26 33 5 64
c10125_g1 138 548 819 507 374 34 58
c101976_g1 34 121 124 44 39 10 27
c102234_g1 12 73 17 7 13 3 2
...
> time1
[1] 0 4 12 18 21 24 48
> design # for ns(time1, df=3)
(Intercept) X1 X2 X3
1 1 0.0000000 0.0000000 0.0000000
2 1 -0.1102765 0.2696576 -0.1540900
3 1 -0.1171348 0.6066478 -0.3466559
4 1 0.1771790 0.5572614 -0.3078531
5 1 0.3323824 0.4900661 -0.2443235
6 1 0.4347552 0.4341309 -0.1651824
7 1 -0.1621622 0.3783784 0.7837838
# the relevant code:
z = DGEList(counts=filtered_Counts)
z = calcNormFactors(z)
library(splines)
X = ns(time1, df=3)
design = model.matrix(~X)
v = voom(z,design,plot=TRUE)
fit = lmFit(v, design)
fit = eBayes(fit)
topTable(fit, coef= ??? ) # what is the meaning of contrasts in this context ?
Are the treatment individuals the same as the control individuals? In other words, did you treat each individual twice with treatment and control, or are the treated people different from the control people?
Are the treatment individuals the same as the control individuals?
the answer is positive, at least at this point (we have only sequenced a pilot experiment yet, but will expand that)
more specifically, these are not people but embryo cultures, where each culture is composed of thousands of embryos that all came from the same father and mother, but they can be considered as a single individual.
Well, when you get around to it, there are several options for analyzing that kind of experimental design, ranging from simple (treatment-specific intercept and spline coefficients) to complicated (individual-specific intercept and spline coefficients with treatment-time interaction coefficients).