Dear all,
I am using the limma trend pipeline to analyze RNA-seq data and applying it to the CAMERA pipeline. I have some chanllenges during this process.
- In the camera() pipeline, should I also set trend.var = TRUE ? (I refered to eBayes, which should also be set as TRUE here)
My data contains different time point.
expr = DGEList(counts, group)
logCPM <- cpm(expr, log = TRUE, prior.count = 3)
fit <- lmFit(logCPM, design)
fit.eb <- eBayes(fit, trend = TRUE)
fit2 = contrasts.fit(fit, my.contrasts)
fit.eb = eBayes(fit2, trend = TRUE, robust= FALSE)
head(my.contrasts)[1:5,1:5]
# Contrasts
#Levels t05h_vs_t0h t1h_vs_t0h t2h_vs_t0h t4h_vs_t0h t8h_vs_t0h
# t0h -1 -1 -1 -1 -1
# t05h 1 0 0 0 0
# t1h 0 1 0 0 0
# t2h 0 0 1 0 0
# t4h 0 0 0 1 0
camera(logCPM, index, design, contrast=my.contrasts[,1],
inter.gene.cor=0.01, trend.var = TRUE, use.ranks=TRUE)
- I am trying to use cameraPR() as a 'pre-ranked' version to speed up. In the help documentation example, I noticed the t value was used. Is the t value more recommended, or can I pre-rank using log2 fold change?
cameraPR(fit$t[,2], list(set1=index1,set2=index2))
Thanks,
Jiahao Tian
Tnx! Since I used all the gene sets downloaded from Msigdb, the camera() was actually much slower than I expected.
It is recommended to not use all gene sets from MSigDB (see here). There are (by my very rough estimation) about 58K gene sets in the MSigDB, and several of them have important legal considerations. Instead, carefully select a subset of GMT files that you think will be most useful for the experiment at hand. Then, you will need to decide if you want to adjust p-values across databases or within databases.
CAMERA and CAMERA-PR are still leagues faster than almost any other method for analyzing molecular signatures, especially those that are GSEA-like. Since it is recommended to use
camera
withinter.gene.cor = 0.01
(the default), you could also just fit a model to the genes, transform the matrix of moderated t-statistics from theMArrayLM
object to their standard Normal equivalents withlimma::zscoreT
(this is whatcamera()
does internally), and then usecameraPR
separately on each column vector of the z matrix. This will avoid having to fit the model separately for each contrast or coefficient, so it will save some time. For exampleThank you for your reply. I did ignore the legal considerations. That is a big problem. Totally agree. In the first step, I only took a breif check about all the gene sets. Will select interesting sets in the next analysis. I have tried
limma::zscoreT
. As for previous, usingcameraPR
has save me lost of time. Your method saved more. Tnx again!