Search
Question: Camera gene set interpretation
0
18 months ago by
natalia.pietrosemoli0 wrote:

Dear all,

After some careful reading, I've settled for CAMERA as my favaorite gene set enrichment test.

However, in one of my datasets (condition 3 vs condition 1; 3 replicates, microarray data), I get a very confusing error:

I get  virtually the  same enriched gene sets (and direction!) when I perform the test on the first or on the second condition.  When I look at the genes, they are also the same! How can this be possible?

Here's some snipets:

Mi data are in this data.frame:

colnames(final.table)

'data.frame':    22518 obs. of  9 variables:
$Probe.Set.ID: Factor w/ 22518 levels "1415676_a_at",..: 1 2 3 4 5 6 7 8 9 10 ...$ SYMBOL      : Factor w/ 12142 levels "0610006L08Rik",..: 8523 3533 8506 1316 6878 1654 6973 8500 12091 6863 ...
$Entrez_IDs : Factor w/ 12142 levels "100017","100019",..: 3274 7450 8495 2718 11955 1075 11391 3264 10576 11953 ...$ Cond1_1.CEL : num  11.57 8.29 10.63 9.87 8.32 ...
$Cond1_2.CEL : num 11.48 8.16 10.47 9.63 8.06 ...$ Cond1_3.CEL : num  11.39 8.41 11.01 10.18 8.63 ...
$Cond3_1.CEL : num 11.11 7.93 10.58 8.89 7.98 ...$ Cond3_2.CEL : num  11.13 8.16 10.51 9.07 7.67 ...
$Cond3_3.CEL : num 11.04 8.62 11.04 9.1 7.82 ... And then I construct my test: grp = c( "Cond1", "Cond3") Group=as.factor(c(rep(grp[1], length(grep(grp[1], colnames(final.table),ignore.case = F ))), rep(grp[2], length(grep(grp[2], colnames(final.table),ignore.case = F ))))) des = model.matrix(~0+Group) colnames(des) <- levels(Group) contr <- makeContrasts(Cond3-Cond1, levels = des) str_split(rownames(final.table), pattern = "_") ->k unlist(lapply(k,function(x) x[1]))->k ids2indices(iset,k)-> indices camera(final.table, indices, des, contrast=2)-> current.cam.up.Cond3 camera(final.table,, indices, des, contrast=1)-> current.cam.up.Cond1 Here I'm showing the first 6 gene sets, but you can trust me that they are pretty much the same ones... current.cam.up.Cond3 NGenes Direction PValue FDR HALLMARK_MYC_TARGETS_V1 452 Up 3.867097e-12 1.933549e-10 HALLMARK_OXIDATIVE_PHOSPHORYLATION 379 Up 1.868544e-10 4.671361e-09 HALLMARK_UNFOLDED_PROTEIN_RESPONSE 262 Up 1.813248e-05 3.022080e-04 HALLMARK_MTORC1_SIGNALING 492 Up 4.983641e-05 6.229551e-04 HALLMARK_TNFA_SIGNALING_VIA_NFKB 419 Up 4.852641e-04 4.852641e-03 HALLMARK_MYC_TARGETS_V2 138 Up 7.818134e-04 6.515112e-03 > head(res.fil.up.A) NGenes Direction PValue FDR HALLMARK_MYC_TARGETS_V1 452 Up 6.978126e-14 3.489063e-12 HALLMARK_OXIDATIVE_PHOSPHORYLATION 379 Up 1.145075e-10 2.862686e-09 HALLMARK_E2F_TARGETS 509 Up 1.303369e-08 2.172282e-07 HALLMARK_G2M_CHECKPOINT 592 Up 4.696686e-05 5.870858e-04 HALLMARK_UNFOLDED_PROTEIN_RESPONSE 262 Up 6.785470e-05 6.785470e-04 HALLMARK_MTORC1_SIGNALING 492 Up 2.174479e-04 1.812066e-03 I would expect that if the genes sets are significicant and the same in the two conditions, the direction should change.. Thanks a lot for helping solve this puzzle, Natalia > sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.6 (El Capitan) locale: [1] C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] mouse4302cdf_2.18.0 genefilter_1.54.2 affyPLM_1.48.0 preprocessCore_1.34.0 [5] gcrma_2.44.0 affy_1.50.0 mouse4302.db_3.2.3 org.Mm.eg.db_3.3.0 [9] AnnotationDbi_1.34.4 rgl_0.97.0 dplyr_0.5.0 ggrepel_0.6.5 [13] ggplot2_2.2.1 RColorBrewer_1.1-2 stringr_1.1.0 knitr_1.15.1 [17] gplots_3.0.1 calibrate_1.7.2 MASS_7.3-45 FactoMineR_1.34 [21] oligo_1.36.1 Biostrings_2.40.2 XVector_0.12.1 IRanges_2.6.1 [25] S4Vectors_0.10.3 oligoClasses_1.34.0 limma_3.28.21 Biobase_2.32.0 [29] BiocGenerics_0.18.0 loaded via a namespace (and not attached): [1] jsonlite_1.2 splines_3.3.1 foreach_1.4.3 gtools_3.5.0 [5] shiny_1.0.0 assertthat_0.1 highr_0.6 RSQLite_1.1-2 [9] lattice_0.20-34 digest_0.6.11 GenomicRanges_1.24.3 colorspace_1.3-2 [13] Matrix_1.2-8 htmltools_0.3.5 httpuv_1.3.3 plyr_1.8.4 [17] XML_3.98-1.5 affxparser_1.44.0 zlibbioc_1.18.0 xtable_1.8-2 [21] scales_0.4.1 gdata_2.17.0 affyio_1.42.0 ff_2.2-13 [25] annotate_1.50.1 tibble_1.2 SummarizedExperiment_1.2.3 lazyeval_0.2.0 [29] survival_2.40-1 magrittr_1.5 mime_0.5 evaluate_0.10 [33] memoise_1.0.0 BiocInstaller_1.22.3 tools_3.3.1 munsell_0.4.3 [37] cluster_2.0.5 flashClust_1.01-2 GenomeInfoDb_1.8.7 caTools_1.17.1 [41] RCurl_1.95-4.8 grid_3.3.1 iterators_1.0.8 htmlwidgets_0.8 [45] leaps_3.0 labeling_0.3 bitops_1.0-6 gtable_0.2.0 [49] codetools_0.2-15 DBI_0.5-1 R6_2.2.0 bit_1.1-12 [53] KernSmooth_2.23-15 stringi_1.1.2 Rcpp_0.12.9 scatterplot3d_0.3-38 ADD COMMENTlink modified 18 months ago by Gordon Smyth34k • written 18 months ago by natalia.pietrosemoli0 I was just trying a shortcut to typing my whole code. Which is here. And it works, my question still remains the same. runTest <- function(iset,name, final.table, grp){ c(grep(grp[1], colnames(final.table),ignore.case = F ), grep(grp[2], colnames(final.table),ignore.case = F))->curr.ind exp.mat.0<-final.table[,curr.ind] dummy_ind <- seq(from=1, to =dim(exp.mat.0)[1]) rownames(exp.mat.0)<- paste0(final.table$Entrez_IDs,"_",dummy_ind)  # because we have non-unique values

# remove NA rows
complete.cases(exp.mat.0)->a
exp.mat.0 <- exp.mat.0[a,]

Group=as.factor(c(rep(grp[1], length(grep(grp[1], colnames(final.table),ignore.case = F ))),
rep(grp[2], length(grep(grp[2], colnames(final.table),ignore.case = F )))))
des = model.matrix(~0+Group)
colnames(des) <- levels(Group)

contr <- makeContrasts(Cond3-Cond1, levels = des)
str_split(rownames(exp.mat.0), pattern = "_") ->k
unlist(lapply(k,function(x) x[1]))->k

ids2indices(iset,k)-> indices
camera(exp.mat.0, indices, des, contrast=2)-> current.cam  #.up.Q
camera(exp.mat.0, indices, des, contrast=1)-> current.cam

entrez.ids<- lapply(indices, function(x) k[x])
symbol.ids <- lapply(entrez.ids, function(x) match(x, final.table\$Entrez_IDs))
col.id <- which(names(final.table) == "SYMBOL")
symbol.names <- lapply(symbol.ids, function(x) paste(as.character(final.table[unlist(x),col.id]), collapse = ";"))

genes.cam <-vector()
match(rownames(current.cam),names(symbol.names))->b
genes.cam <-unlist(symbol.names[b])

cbind(current.cam, genes.cam)->current.camera.tot

res <-list( "camera"= current.camera.tot)

write.table(cbind(rownames(current.camera.tot),current.camera.tot), file=paste0(outdir,name,".",grp[2],"vs",grp[1],".camera.txt"), row.names = FALSE, sep = "\t", col.names =
c("geneSet",colnames(current.camera.tot)))
return(res)
}

isets <- list()
isets[[1]] <- Mm.H
isets[[2]] <- Mm.c2
names(isets) <- c("H","C2")

# get enrichments
enrichment.H <- list()
enrichment.C2 <- list()

for (i in 1:dim(comb.all)[2]){

runTest(isets[[1]], "H", mat, c(comb.all[2*i-1],comb.all[2*i]))->enrichment.H[[i]]
#runTest(isets[[2]], "C2", mat, c(comb.all[2*i-1],comb.all[2*i]))->enrichment.C2[[i]]

names(enrichment.H)[[i]] <-  paste0(comb.all[2*i],".vs.",comb.all[2*i-1])
#names(enrichment.C2)[[i]] <-  paste0(comb.all[2*i-1],".vs.",comb.all[2*i])

}

If I''ve previously defined my design:

> des
Mock_3h Z_3h
1       1    0
2       1    0
3       1    0
4       0    1
5       0    1
6       0    1

wouldn't this instruction:

camera(exp.mat.0, indices, des, contrast=2)

specifiy that I run my test for my condition Z_3h?

this is how I interpret the contrast parameter.

"Can be an integer specifying a column of 'design'..."

What James is saying is that since your design does not have an intercept term (ie. you are using a "cell means" model), you need to test a contrast among the columns of your design matrix by specifying a vector of coefficients which is as long as the number of columns of your design matrix, and not just a single column (index) into our design matrix.

camera(exp.mat.0, indices, des, contrast=2)

is essentially testing something you don't really care to answer, ie. you are asking if the mean expression of the genes is different from zero in your "Z_3h" group.

You probably would rather ask if the mean expression is different between your two groups, in which case you need to pass the vector that specifies your contrast across the columns of your design (as mentioned in the docs), so something like:

camera(exp.mat.0, indices, des, contrast=c(-1, 1))

Which is testing the difference in expression calculated by your Z_3h - Mock_3h groups, given the design you specified in your comment above.

Putting it a third way: when James said camera knows nothing about your contrast matrix, he means that it doesn't know that the "2" in your contrast=2 is referencing the second column of a contrast matrix you built somewhere else (again, beating a dead horse: that contrast matrix has as many rows as you have columns in your design matrix). You need to pass the vector that is the second column of your contrast matrix directly.

Make sense?

0
18 months ago by
United States
James W. MacDonald47k wrote:

It's probably more useful if you show the actual code you used, rather than just re-typing things. For example, this line:

camera(final.table,, indices, des, contrast=1)-> current.cam.up.Cond1

won't actually run, due to the double commas after final.table. Plus, while the assignment operator is bi-directional, it's easier for people to decipher your code if you follow the convention of right to left assignment (<-) rather than randomly switching back and forth in your code.

Also do note that camera doesn't use (or even know about) your contrast matrix, so specifying contrast = 1 or contrast = 2 isn't a useful thing to do when you use a cell means model. In that situation the statistic you are using is testing that the expression of each gene is different from zero, rather than different between the two groups. This should be clear from the help page:

contrast: contrast of the linear model coefficients for which the test
is required. Can be an integer specifying a column of
'design', or else a numeric vector of same length as the
number of columns of 'design'.
0
18 months ago by
Gordon Smyth34k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth34k wrote:

You could use

camera(final.table, indices, des, contrast=c(-1,1))

But why not make it even easier:

des <- model.matrix(~group)
camera(final.table, indices, des)

Thank you guys for the answers, took some time, but finally got it!