Limma Voom and quadratic regression
1
0
Entering edit mode
assaf www ▴ 140
@assaf-www-6709
Last seen 5.3 years ago

Dear Limma/EdgeR/Voom developers/users,

I am working on a developmental RNA-Seq data, and would like to fit time-dependent gene transcription profiles to the quadratic function, similar to the way it is done in the Limma Voom article, Law et al. 2014 (Genome biology), for drosophila developmental samples.

My experimental setting is in fact more complex, since it is based on paired design
(3 different 'individuals' are tested, or more correctly cultures, at different time points, and this is repeated for different treatments) ...

would be grateful if you can explain which commands should be used for first performing similar fit as shown in this article, and then expanding it to consider paired design ?


thanks in advance

Assaf

limma voom • 2.4k views
ADD COMMENT
0
Entering edit mode

You should separate your tags to limma and voom, otherwise people won't see this question. Also, give exact details of your experimental design if you want to get useful advice. Otherwise, you'll just be stuck with vague generalities.

ADD REPLY
0
Entering edit mode

Hi Aaron,

So, the design is partitioned to 3 layers:

1) genes count measurements taken from a single individual at x time points (currently x= 0  4 12 18 21 24 48 hours).

2) the above experiment is repeated n times, for n different individuals, to give n*x samples

3) the above is repeated 2 times for control, and treatment, to give 2*n*x samples

At this point I have only the sequencing results of a pilot experiment with x=7 samples from a single individual. Not sure the blow code for the pilot experiment is correct. Below code is based on the spline fit example in Limma manual, for order=3, assuming this can represent quadratic trends (is it correct?).

Also: * unclear to me what is the meaning of the topTable() coefficients in this context (since there are no explicit definitions of groups of comparisons). * unclear to me the meaning of the design matrix that was produced, which is very different from the standard EdgeR formats I know (please see below)

thanks in advance for your time and help, (* and thanks for valuable help some time ago)
Assaf

 

filtered_Counts
            t0  t4 t12 t18 t21 t24 t48
c10098_g2    0   0  62  58  46   2  27
c101026_g1  15  43  37  26  33   5  64
c10125_g1  138 548 819 507 374  34  58
c101976_g1  34 121 124  44  39  10  27
c102234_g1  12  73  17   7  13   3   2
...

> time1
[1]  0  4 12 18 21 24 48

> design # for ns(time1, df=3)
  (Intercept)         X1        X2         X3
1           1  0.0000000 0.0000000  0.0000000
2           1 -0.1102765 0.2696576 -0.1540900
3           1 -0.1171348 0.6066478 -0.3466559
4           1  0.1771790 0.5572614 -0.3078531
5           1  0.3323824 0.4900661 -0.2443235
6           1  0.4347552 0.4341309 -0.1651824
7           1 -0.1621622 0.3783784  0.7837838

# the relevant code:
z = DGEList(counts=filtered_Counts)
z = calcNormFactors(z)
library(splines)
X = ns(time1, df=3)
design = model.matrix(~X)
v = voom(z,design,plot=TRUE)
fit = lmFit(v, design)
fit = eBayes(fit)
topTable(fit, coef= ??? ) # what is the meaning of contrasts in this context ?

 

ADD REPLY
0
Entering edit mode

Are the treatment individuals the same as the control individuals? In other words, did you treat each individual twice with treatment and control, or are the treated people different from the control people?

ADD REPLY
0
Entering edit mode

Are the treatment individuals the same as the control individuals?

the answer is positive, at least at this point (we have only sequenced a pilot experiment yet, but will expand that)

more specifically, these are not people but embryo cultures, where each culture is composed of thousands of embryos that all came from the same father and mother, but they can be considered as a single individual.

ADD REPLY
0
Entering edit mode

Well, when you get around to it, there are several options for analyzing that kind of experimental design, ranging from simple (treatment-specific intercept and spline coefficients) to complicated (individual-specific intercept and spline coefficients with treatment-time interaction coefficients).

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

You ask "what is the meaning of contrasts in this context". The topTable is finding genes whose differential expression changes with time, regardless of the shape of the trend. It will find almost anything -- genes that go up, genes than go down, genes that go up then down, genes that go up then down then up again, etc.

If you want to view the trend for a particular gene 'i', just use:

mu <- fit$coefficients %*% t(design)
plot(mu[i,])

or alternatively

plot(Time, mu[i,])

Yes, the spline with three parameters can represent quadratic and indeed cubic trends.

ADD COMMENT
1
Entering edit mode

To elaborate on Gordon's answer, you should be using topTable with coef=2:4 in order to test for any trend. This will drop all of the spline coefficients, i.e., the null model will contain only an intercept term, such that it assumes that there is no time effect at all. Don't try to interpret the meaning of the individual coefficients X1-3, as you need to consider them altogether to determine the effect on the fitted value. Similarly, don't try to interpret the log-fold changes from topTable, as these represent changes in the coefficients. The major thing you should get out of the analysis is the (adjusted) p-value indicating whether or not there is a significant trend; you can then go and look at the trends in more detail, using the code that Gordon has suggested.

ADD REPLY
0
Entering edit mode

Hi Gordon, and Aaron,

OK, so I understand how to compare the null model (no spline), and the alternative one, at least in this simple setting with no paired design, and, in order to plot the fit for a given gene, the 4 values of its fit (intercept,x1,x2,x3) are multiplied by the corresponding 4 values of the design matrix for each time point, and the sums of products are the predicted outcomes of the gene's time-points.

many thanks, I'm looking into this

Assaf

ADD REPLY
0
Entering edit mode

It is good practice to specify coef=2:4. I didn't say this explicitly because calling

topTable(fit)

will do the same thing implicitly. In this case limma tests for all the coefficients, but drops the first one automatically because it is called "(Intercept)".

ADD REPLY
0
Entering edit mode

Sorry, but what exactly is mu here, i.e. how do I get from expression levels to mu and back?

ADD REPLY
0
Entering edit mode

mu is the matrix of fitted values for all observations, i.e., the value of the fitted spline at the time point of each sample. To get to mu from the expression levels, you fit the linear model to the latter with lmFit (as described in the code in the comments of the original post) and then apply Gordon's bit of code. To get to the expression levels from mu, you need to add the residuals of the fitted model to mu. But I don't really see why you would need to do the second thing - if you want the expression levels, you should already have them, otherwise how did you fit the linear model?

ADD REPLY

Login before adding your answer.

Traffic: 735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6