Question: Incorporating subject ID as random effect for repeated measures RNA-Seq analysis
gravatar for mikhael.manurung
12 weeks ago by
mikhael.manurung10 wrote:

Dear all,

I am assigned to analyse RNA-Seq data from a study with a repeated measures design with three time points. From what I understand (after going through Q&As in this Bioconductor forum and reading several manuals), the limma package is able to incorporate the within-subjects correlation via the blocking argument and the duplicateCorrelation command. However, a statistician in my university advised me to use lme4 for loops instead to fit every gene one by one (after voom transformation & using sample-specific weights obtained from the command).

The latter approach took a LOT of time to run (few hours), while limma only took a few seconds. In addition, working within the limma workflow enables me to use the included commands such as camera and goana seamlessly.

What do you usually do when you analyse repeated measures RNA-Seq data? How many of you use custom-scripts with lme4/nlme to fit mixed model for each genes? How common is the practice anyway?

Thank you very much for your time.

Best regards,

ADD COMMENTlink modified 12 weeks ago by Gordon Smyth35k • written 12 weeks ago by mikhael.manurung10
gravatar for Gordon Smyth
12 weeks ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

It is very rare to use lme4 for a microarray differential expression analysis and even rarer for RNA-seq. Some of the reasons are explained in the previous thread linked to by Aaron Lun.

I'm the limma author, so you might consider me biased, but I would not seriously consider using lme4 for an RNA-seq differential expression analysis. Apart from speed and convenience, the loss of power and the increase in false discovery rate would be unacceptable to me.

I have used lme4 myself for one particular microarray study:

but the purpose of that study was to estimate the variance components, not to do differential expression analysis.

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by Gordon Smyth35k

Dear Prof. Smyth,

Thank you very much for your response.

I am also biased toward limma due to its ease of use and seamless integration with various downstream analyses. However, due to my lack of credentials as a bioinformatician, no one took me seriously when I (tried to) explain the eBayes principle and its utility for analysing genomics data.

Again, thank you for your contribution to the community.


ADD REPLYlink written 11 weeks ago by mikhael.manurung10
gravatar for Ryan C. Thompson
12 weeks ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.0k wrote:

There are trade-offs to both approaches. If you fit a separate model for every gene, you lose the core feature of limma: empirical Bayes information sharing when estimating the variance for each gene. The other main difference is that limma's mixed model functionality estimates a single correlation value for the random effect that is shared across all genes when fitting the model. If you have hundreds of samples, it might make sense to fit a separate model for each gene using lme4 or some other package. However, most RNA-seq experiments don't have sufficient sample sizes to robustly estimate parameters for each gene independently, so you generally want to stick to limma or other methods (like edgeR and DESeq2) that share information between genes to make up for the small sample sizes.

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by Ryan C. Thompson7.0k

Also check out some related comments at A: How does edgeR compare to lme4?.

ADD REPLYlink written 12 weeks ago by Aaron Lun21k

Dear Ryan,

Thank you for your response. Indeed, I did not have that many samples. I have to say that I really like the underlying principle of eBayes!


ADD REPLYlink written 11 weeks ago by mikhael.manurung10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 296 users visited in the last hour