using limma trend workflow in CAMERA
1
0
Entering edit mode
Jay • 0
@b9a7fe93
Last seen 11 weeks ago
Taiwan

Dear all,

I am using the limma trend pipeline to analyze RNA-seq data and applying it to the CAMERA pipeline. I have some chanllenges during this process.

  1. In the camera() pipeline, should I also set trend.var = TRUE ? (I refered to eBayes, which should also be set as TRUE here)

My data contains different time point.

expr = DGEList(counts, group)
logCPM <- cpm(expr, log = TRUE, prior.count = 3)
fit <- lmFit(logCPM, design)
fit.eb <- eBayes(fit, trend = TRUE)

fit2 = contrasts.fit(fit, my.contrasts)
fit.eb = eBayes(fit2, trend = TRUE, robust= FALSE)

head(my.contrasts)[1:5,1:5]
# Contrasts
#Levels t05h_vs_t0h t1h_vs_t0h t2h_vs_t0h t4h_vs_t0h t8h_vs_t0h
#  t0h           -1         -1         -1         -1         -1
#  t05h           1          0          0          0          0
#  t1h            0          1          0          0          0
#  t2h            0          0          1          0          0
#  t4h            0          0          0          1          0

camera(logCPM, index, design, contrast=my.contrasts[,1],
            inter.gene.cor=0.01, trend.var = TRUE, use.ranks=TRUE)
  1. I am trying to use cameraPR() as a 'pre-ranked' version to speed up. In the help documentation example, I noticed the t value was used. Is the t value more recommended, or can I pre-rank using log2 fold change?
cameraPR(fit$t[,2], list(set1=index1,set2=index2))

Thanks,

Jiahao Tian

CAMERA limma • 644 views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 11 hours ago
WEHI, Melbourne, Australia

Yes, you should set trend.var=TRUE in camera whenever you would use trend=TRUE for eBayes. In particular, trend.var=TRUE should always be used when analysing logCPM values from RNA-seq. This is to do with the mean-variance relationship, not with whether your experiment includes time points.

When using cameraPR, ranking by moderated t-statistic is generally better than ranking by logFC because the t-statistic incorporates both fold-change and variability instead of fold-change alone. Ranking by logFC can often prioritise low-count genes. However, you have used cpm with a large prior.count in this case, so this effect is mitigated and ranking by logFC should also work well.

Do you need a "speed up" since camera is almost instantaneous anyway?

ADD COMMENT
0
Entering edit mode

Tnx! Since I used all the gene sets downloaded from Msigdb, the camera() was actually much slower than I expected.

ADD REPLY
0
Entering edit mode

It is recommended to not use all gene sets from MSigDB (see here). There are (by my very rough estimation) about 58K gene sets in the MSigDB, and several of them have important legal considerations. Instead, carefully select a subset of GMT files that you think will be most useful for the experiment at hand. Then, you will need to decide if you want to adjust p-values across databases or within databases.

CAMERA and CAMERA-PR are still leagues faster than almost any other method for analyzing molecular signatures, especially those that are GSEA-like. Since it is recommended to use camera with inter.gene.cor = 0.01 (the default), you could also just fit a model to the genes, transform the matrix of moderated t-statistics from the MArrayLM object to their standard Normal equivalents with limma::zscoreT (this is what camera() does internally), and then use cameraPR separately on each column vector of the z matrix. This will avoid having to fit the model separately for each contrast or coefficient, so it will save some time. For example

# Fit the model
fit <- lmFit(object, design)
fit.contr <- contrasts.fit(fit, contrasts) # may not be applicable, depending on the design
fit.smooth <- eBayes(fit.contr)

zmat <- with(fit.smooth, zscoreT(t, df.total))

# create index (see limma::ids2indices() using zmat)

res <- lapply(colnames(zmat), function(coef_i) {
  out <- cameraPR(statistic = zmat[, coef_i], 
                  index = index, 
                  inter.gene.cor = 0.01)
  out$GeneSet <- rownames(out)

  return(out)
})
names(res) <- colnames(zmat)
res <- data.table::rbindlist(res, idcol = "contrast")
ADD REPLY
0
Entering edit mode

Thank you for your reply. I did ignore the legal considerations. That is a big problem. Totally agree. In the first step, I only took a breif check about all the gene sets. Will select interesting sets in the next analysis. I have tried limma::zscoreT. As for previous, using cameraPR has save me lost of time. Your method saved more. Tnx again!

ADD REPLY

Login before adding your answer.

Traffic: 439 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6