Incorporating subject ID as random effect for repeated measures RNA-Seq analysis
2
0
Entering edit mode
@mikhaelmanurung-17423
Last seen 2.5 years ago
Netherlands

Dear all,

I am assigned to analyse RNA-Seq data from a study with a repeated measures design with three time points. From what I understand (after going through Q&As in this Bioconductor forum and reading several manuals), the limma package is able to incorporate the within-subjects correlation via the blocking argument and the duplicateCorrelation command. However, a statistician in my university advised me to use lme4 for loops instead to fit every gene one by one (after voom transformation & using sample-specific weights obtained from the command).

The latter approach took a LOT of time to run (few hours), while limma only took a few seconds. In addition, working within the limma workflow enables me to use the included commands such as camera and goana seamlessly.

What do you usually do when you analyse repeated measures RNA-Seq data? How many of you use custom-scripts with lme4/nlme to fit mixed model for each genes? How common is the practice anyway?

Thank you very much for your time.

Best regards,
Mikhael

limma rna-seq mixed model repeated measures • 2.5k views
ADD COMMENT
4
Entering edit mode
@gordon-smyth
Last seen 15 minutes ago
WEHI, Melbourne, Australia

It is very rare to use lme4 for a microarray differential expression analysis and even rarer for RNA-seq. Some of the reasons are explained in the previous thread linked to by Aaron Lun.

I'm the limma author, so you might consider me biased, but I would not seriously consider using lme4 for an RNA-seq differential expression analysis. Apart from speed and convenience, the loss of power and the increase in false discovery rate would be unacceptable to me.

I have used lme4 myself for one particular microarray study:

    https://doi.org/10.1371/journal.pone.0019556

but the purpose of that study was to estimate the variance components, not to do differential expression analysis.

ADD COMMENT
0
Entering edit mode

Dear Prof. Smyth,

Thank you very much for your response.

I am also biased toward limma due to its ease of use and seamless integration with various downstream analyses. However, due to my lack of credentials as a bioinformatician, no one took me seriously when I (tried to) explain the eBayes principle and its utility for analysing genomics data.

Again, thank you for your contribution to the community.

Best,
MIkhael
 

ADD REPLY
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 9 weeks ago
Icahn School of Medicine at Mount Sinai…

There are trade-offs to both approaches. If you fit a separate model for every gene, you lose the core feature of limma: empirical Bayes information sharing when estimating the variance for each gene. The other main difference is that limma's mixed model functionality estimates a single correlation value for the random effect that is shared across all genes when fitting the model. If you have hundreds of samples, it might make sense to fit a separate model for each gene using lme4 or some other package. However, most RNA-seq experiments don't have sufficient sample sizes to robustly estimate parameters for each gene independently, so you generally want to stick to limma or other methods (like edgeR and DESeq2) that share information between genes to make up for the small sample sizes.

ADD COMMENT
1
Entering edit mode

Also check out some related comments at A: How does edgeR compare to lme4?.

ADD REPLY
0
Entering edit mode

Dear Ryan,

Thank you for your response. Indeed, I did not have that many samples. I have to say that I really like the underlying principle of eBayes!

Best,
Mikhael

ADD REPLY

Login before adding your answer.

Traffic: 983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6