Question: Time course experiment using limma with voom
0
5 months ago by
Helsinki
mahes.muniandy0 wrote:

Hi,

I am performing RNA sequencing data analysis on weight loss data with three timepoints: before diet, 2 months after diet and 10 months after diet (I’m calling the timepoints TP1, TP2, TP3). My variable of interest is percentage weight lost, so I set my percentage weight lost at TP1 at 0 and TP2 and TP3 at (weight1-weight2)/weight1 and (weight1-weight3)/weight1 respectively. At the moment I am analysing my TP2vsTP1 and TP3vsTP1 separately. Not all individuals have data from all three timepoints.

My limma model looks like this:

design <- model.matrix(~ 0 + sequencing_center + sex +  study_center + age + perc_weightlost_TP1TP2)
v <- voom(dgeT1T2, design)
corfit <- duplicateCorrelation(v,design,block=SubjectID)
v <- voom(dgeT1T2,design,block= SubjectID,correlation=corfit$consensus) fit <- lmFit(v,design,block=ID,correlation=corfit$consensus)
fit <- eBayes(fit)


My interest is in finding out how gene expression changes according to percentage weight lost. Should I be putting the time point into my design matrix above? I had figured it was not necessary because the percentage weight lost already captures this information. However, if I want to put all three time points in my analysis, should I go about it as below?

design <- model.matrix(~ 0 + sequencing_center + sex +  study_center + age + TP:perc_weightlost)


I would be grateful for any pointers on how to analyse this type of data.

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /apps/statistics2/R-3.5.1/lib/libRblas.so
LAPACK: /apps/statistics2/R-3.5.1/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
[3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.24.3 limma_3.38.3

loaded via a namespace (and not attached):
[1] compiler_3.5.1  Rcpp_1.0.1      grid_3.5.1      locfit_1.5-9.1
[5] statmod_1.4.30  lattice_0.20-38


Thank you,

Mahes Muniandy,

University of Helsinki

modified 5 months ago by Aaron Lun25k • written 5 months ago by mahes.muniandy0
Answer: Time course experiment using limma with voom
1
5 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

If weight loss and time are well correlated, you're in trouble. With your experimental design, there's no way to distinguish between the effect of time - due to aging or whatever - and the effect of weight loss. The correct way to do it would be to have a control group without any diet, which would provide a baseline for the time effect.

Nonetheless, you might be able to get something out of this data if time and weight loss are not well correlated (e.g., some individuals lose more weight than others, or weight loss is not linear with respect to time.) You can then put both factors into the model to identify significant effects associated with weight loss that are conditional on the time effect. This reduces the chance that the resulting DE genes are driven by the confounding time effect. However, this comes at the cost of a loss of power, which is more severe as time and weight loss become more correlated.

Note that if time and weight loss are perfectly correlated, then you can't put them in the same model together, as the coefficients will not be estimable. And TP:perc_weightlost doesn't make sense, just use an additive model.

## Edit:

I hadn't appreciated that you were modeling time as a factor, not a covariate. That's fine, and probably the correct approach, but it changes my answer a bit. Let's mock up a sample table for demonstration purposes.

patient <- gl(8, 2)
time <- rep(c("T0", "T1"), 8) # Two time points, for simplicity.

set.seed(100)
weight0 <- rnorm(8, mean=70)
weight1 <- weight0 - runif(8, 0, 10)
wloss <- as.vector(rbind(0, 1 - weight0/weight1)) # interleaved


design <- model.matrix(~ 0 + patient + time + time:wloss)
design <- design[,-10] # remove the all-zero column.


The first 8 coefficients are patient blocking factors, representing the fitted expression of each patient at time zero. The next coefficient is the average log-fold change due to time within each patient. The final coefficient represents the "effect" (i.e., association) of expression with weight loss at time point 1. As I mentioned before, the last two coefficients are likely to be almost completely confounding, which will reduce your power to detect changes compared to a more carefully designed experiment. This is the correct behavior, because otherwise you would get false positives associated with weight loss that are actually caused by time.

The above example is fairly easy to extend to three time points. Note that if age and sex are the same for all samples derived from a single patient, they will be redundant with the patient blocking factors - you don't need them. Same for sequencing center, if all samples from a single patient were sequenced at the same location.

Hi Aaron,

Thank you so much for the guidance. Yes, some individuals lose more weight than others and weight loss is not linear with respect to time, plus between timepoints 2 and 3, the diet is different than between timepoints 1 and 2. I realise all this complicates matters and I am quite confused on how to model the data.

If I “put both factors into the model to identify significant effects associated with weight loss that are conditional on the time effect” would it be a simple additive model with a contrast matrix? I'm not quite sure how to do a contrast matrix with a continuous and categorical variable. Or can I make the timepoint the intercept?

´´´
design <- model.matrix(~ 0 + sequencing_center + sex +  study_center + age + perc_weightlost + Timepoint)

´´´


I am also unsure about my independent variable (percentage weight lost). The reason I use percentage weight lost is because the heavier one is, the easier it is to lose weight. So, kgs lost is not comparable across the individuals. But for me to use percentage weight lost, I have made the value zero at TP1. In hind-sight, I am thinking this is wrong, because then the model cannot find the association between percentage weight lost and gene expression at TP1. Is this correct?

If I use weight (kgs) at TP1, TP2, TP3, how do I model the effects of starting weight?

Thank you so much, Mahes Muniandy

See my updated answer. Don't worry about the fact that the weight loss is zero at the first time point. There's no need for the model to account for this, because there's nothing there!

Now, the appropriate model would be different if you wanted to account for the association of expression with the starting weight for each patient, in which case you would need to put in weights at the first time point. But this makes the model more complicated (i.e., the coefficients become more difficult to interpret) and only helps if the weight-related association at the first time point disappears at later time points (otherwise the patient blocking factors would be sufficient).

Hi Aaron, Thank you once again. I have tried the below model as suggested and as expected I get fewer significant genes compared to not having the time points in the model.

design <- model.matrix(~ 0 + patient + time + time:wloss)


Just one more question, if I want to find the effects of age and sex between the individuals, can I extend this to the multi-level model with duplicateCorrelation? Many Thanks, Mahes

Yes, if you wanted to do that, block on patient in duplicateCorrelation() and put in age and sex as additive terms in the design matrix. Those things don't seem that interesting to me, though; perhaps an interaction between wloss and sex would be more fun.