Question

using limma trend workflow in CAMERA

0

Entering edit mode

Jay • 0

@b9a7fe93

Last seen 3 months ago

Taiwan

Dear all,

I am using the limma trend pipeline to analyze RNA-seq data and applying it to the CAMERA pipeline. I have some chanllenges during this process.

In the camera() pipeline, should I also set trend.var = TRUE ? (I refered to eBayes, which should also be set as TRUE here)

My data contains different time point.

expr = DGEList(counts, group)
logCPM <- cpm(expr, log = TRUE, prior.count = 3)
fit <- lmFit(logCPM, design)
fit.eb <- eBayes(fit, trend = TRUE)

fit2 = contrasts.fit(fit, my.contrasts)
fit.eb = eBayes(fit2, trend = TRUE, robust= FALSE)

head(my.contrasts)[1:5,1:5]
# Contrasts
#Levels t05h_vs_t0h t1h_vs_t0h t2h_vs_t0h t4h_vs_t0h t8h_vs_t0h
#  t0h           -1         -1         -1         -1         -1
#  t05h           1          0          0          0          0
#  t1h            0          1          0          0          0
#  t2h            0          0          1          0          0
#  t4h            0          0          0          1          0

camera(logCPM, index, design, contrast=my.contrasts[,1],
            inter.gene.cor=0.01, trend.var = TRUE, use.ranks=TRUE)

I am trying to use cameraPR() as a 'pre-ranked' version to speed up. In the help documentation example, I noticed the t value was used. Is the t value more recommended, or can I pre-rank using log2 fold change?

cameraPR(fit$t[,2], list(set1=index1,set2=index2))

Thanks,

Jiahao Tian

CAMERA limma • 668 views

ADD COMMENT • link 5 months ago • updated 4 months ago Jay • 0

score 3 · Accepted Answer · 2024-08-15

3

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Yes, you should set trend.var=TRUE in camera whenever you would use trend=TRUE for eBayes. In particular, trend.var=TRUE should always be used when analysing logCPM values from RNA-seq. This is to do with the mean-variance relationship, not with whether your experiment includes time points.

When using cameraPR, ranking by moderated t-statistic is generally better than ranking by logFC because the t-statistic incorporates both fold-change and variability instead of fold-change alone. Ranking by logFC can often prioritise low-count genes. However, you have used cpm with a large prior.count in this case, so this effect is mitigated and ranking by logFC should also work well.

Do you need a "speed up" since camera is almost instantaneous anyway?

ADD COMMENT • link 5 months ago Gordon Smyth 52k

0

Entering edit mode

Tnx! Since I used all the gene sets downloaded from Msigdb, the camera() was actually much slower than I expected.

ADD REPLY • link 5 months ago Jay • 0

0

Entering edit mode

It is recommended to not use all gene sets from MSigDB (see here). There are (by my very rough estimation) about 58K gene sets in the MSigDB, and several of them have important legal considerations. Instead, carefully select a subset of GMT files that you think will be most useful for the experiment at hand. Then, you will need to decide if you want to adjust p-values across databases or within databases.

CAMERA and CAMERA-PR are still leagues faster than almost any other method for analyzing molecular signatures, especially those that are GSEA-like. Since it is recommended to use camera with inter.gene.cor = 0.01 (the default), you could also just fit a model to the genes, transform the matrix of moderated t-statistics from the MArrayLM object to their standard Normal equivalents with limma::zscoreT (this is what camera() does internally), and then use cameraPR separately on each column vector of the z matrix. This will avoid having to fit the model separately for each contrast or coefficient, so it will save some time. For example

# Fit the model
fit <- lmFit(object, design)
fit.contr <- contrasts.fit(fit, contrasts) # may not be applicable, depending on the design
fit.smooth <- eBayes(fit.contr)

zmat <- with(fit.smooth, zscoreT(t, df.total))

# create index (see limma::ids2indices() using zmat)

res <- lapply(colnames(zmat), function(coef_i) {
  out <- cameraPR(statistic = zmat[, coef_i], 
                  index = index, 
                  inter.gene.cor = 0.01)
  out$GeneSet <- rownames(out)

  return(out)
})
names(res) <- colnames(zmat)
res <- data.table::rbindlist(res, idcol = "contrast")

ADD REPLY • link 4 months ago Tyler Sagendorf ▴ 10

0

Entering edit mode

Thank you for your reply. I did ignore the legal considerations. That is a big problem. Totally agree. In the first step, I only took a breif check about all the gene sets. Will select interesting sets in the next analysis. I have tried limma::zscoreT. As for previous, using cameraPR has save me lost of time. Your method saved more. Tnx again!

ADD REPLY • link 4 months ago Jay • 0