Search
Question: Camera gene set interpretation
0
gravatar for natalia.pietrosemoli
9 months ago by
natalia.pietrosemoli0 wrote:

Dear all,

After some careful reading, I've settled for CAMERA as my favaorite gene set enrichment test.

However, in one of my datasets (condition 3 vs condition 1; 3 replicates, microarray data), I get a very confusing error:

I get  virtually the  same enriched gene sets (and direction!) when I perform the test on the first or on the second condition.  When I look at the genes, they are also the same! How can this be possible?

Here's some snipets:

Mi data are in this data.frame:

colnames(final.table)

'data.frame':    22518 obs. of  9 variables:
 $ Probe.Set.ID: Factor w/ 22518 levels "1415676_a_at",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ SYMBOL      : Factor w/ 12142 levels "0610006L08Rik",..: 8523 3533 8506 1316 6878 1654 6973 8500 12091 6863 ...
 $ Entrez_IDs  : Factor w/ 12142 levels "100017","100019",..: 3274 7450 8495 2718 11955 1075 11391 3264 10576 11953 ...
 $ Cond1_1.CEL : num  11.57 8.29 10.63 9.87 8.32 ...
 $ Cond1_2.CEL : num  11.48 8.16 10.47 9.63 8.06 ...
 $ Cond1_3.CEL : num  11.39 8.41 11.01 10.18 8.63 ...
 $ Cond3_1.CEL : num  11.11 7.93 10.58 8.89 7.98 ...
 $ Cond3_2.CEL : num  11.13 8.16 10.51 9.07 7.67 ...
 $ Cond3_3.CEL : num  11.04 8.62 11.04 9.1 7.82 ...

And then I construct my test:

grp = c( "Cond1",  "Cond3")

Group=as.factor(c(rep(grp[1], length(grep(grp[1], colnames(final.table),ignore.case = F ))),
          rep(grp[2], length(grep(grp[2], colnames(final.table),ignore.case = F )))))
  des = model.matrix(~0+Group)
  colnames(des) <- levels(Group)
 
  contr <- makeContrasts(Cond3-Cond1, levels = des)

  str_split(rownames(final.table), pattern = "_") ->k
  unlist(lapply(k,function(x) x[1]))->k

ids2indices(iset,k)-> indices

camera(final.table, indices, des, contrast=2)-> current.cam.up.Cond3 
camera(final.table,, indices, des, contrast=1)-> current.cam.up.Cond1 

Here I'm showing the first 6 gene sets, but you can trust me that they are pretty much the same ones...

current.cam.up.Cond3 

                                   NGenes Direction       PValue          FDR
HALLMARK_MYC_TARGETS_V1               452        Up 3.867097e-12 1.933549e-10
HALLMARK_OXIDATIVE_PHOSPHORYLATION    379        Up 1.868544e-10 4.671361e-09
HALLMARK_UNFOLDED_PROTEIN_RESPONSE    262        Up 1.813248e-05 3.022080e-04
HALLMARK_MTORC1_SIGNALING             492        Up 4.983641e-05 6.229551e-04
HALLMARK_TNFA_SIGNALING_VIA_NFKB      419        Up 4.852641e-04 4.852641e-03
HALLMARK_MYC_TARGETS_V2               138        Up 7.818134e-04 6.515112e-03

> head(res.fil.up.A)
                                   NGenes Direction       PValue          FDR
HALLMARK_MYC_TARGETS_V1               452        Up 6.978126e-14 3.489063e-12
HALLMARK_OXIDATIVE_PHOSPHORYLATION    379        Up 1.145075e-10 2.862686e-09
HALLMARK_E2F_TARGETS                  509        Up 1.303369e-08 2.172282e-07
HALLMARK_G2M_CHECKPOINT               592        Up 4.696686e-05 5.870858e-04
HALLMARK_UNFOLDED_PROTEIN_RESPONSE    262        Up 6.785470e-05 6.785470e-04
HALLMARK_MTORC1_SIGNALING             492        Up 2.174479e-04 1.812066e-03
                                                                                    

I would expect that if the genes sets are significicant and the same in the two conditions, the direction should change..

 

Thanks a lot for helping solve this puzzle,

 

Natalia

 

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mouse4302cdf_2.18.0   genefilter_1.54.2     affyPLM_1.48.0        preprocessCore_1.34.0
 [5] gcrma_2.44.0          affy_1.50.0           mouse4302.db_3.2.3    org.Mm.eg.db_3.3.0   
 [9] AnnotationDbi_1.34.4  rgl_0.97.0            dplyr_0.5.0           ggrepel_0.6.5        
[13] ggplot2_2.2.1         RColorBrewer_1.1-2    stringr_1.1.0         knitr_1.15.1         
[17] gplots_3.0.1          calibrate_1.7.2       MASS_7.3-45           FactoMineR_1.34      
[21] oligo_1.36.1          Biostrings_2.40.2     XVector_0.12.1        IRanges_2.6.1        
[25] S4Vectors_0.10.3      oligoClasses_1.34.0   limma_3.28.21         Biobase_2.32.0       
[29] BiocGenerics_0.18.0  

loaded via a namespace (and not attached):
 [1] jsonlite_1.2               splines_3.3.1              foreach_1.4.3              gtools_3.5.0              
 [5] shiny_1.0.0                assertthat_0.1             highr_0.6                  RSQLite_1.1-2             
 [9] lattice_0.20-34            digest_0.6.11              GenomicRanges_1.24.3       colorspace_1.3-2          
[13] Matrix_1.2-8               htmltools_0.3.5            httpuv_1.3.3               plyr_1.8.4                
[17] XML_3.98-1.5               affxparser_1.44.0          zlibbioc_1.18.0            xtable_1.8-2              
[21] scales_0.4.1               gdata_2.17.0               affyio_1.42.0              ff_2.2-13                 
[25] annotate_1.50.1            tibble_1.2                 SummarizedExperiment_1.2.3 lazyeval_0.2.0            
[29] survival_2.40-1            magrittr_1.5               mime_0.5                   evaluate_0.10             
[33] memoise_1.0.0              BiocInstaller_1.22.3       tools_3.3.1                munsell_0.4.3             
[37] cluster_2.0.5              flashClust_1.01-2          GenomeInfoDb_1.8.7         caTools_1.17.1            
[41] RCurl_1.95-4.8             grid_3.3.1                 iterators_1.0.8            htmlwidgets_0.8           
[45] leaps_3.0                  labeling_0.3               bitops_1.0-6               gtable_0.2.0              
[49] codetools_0.2-15           DBI_0.5-1                  R6_2.2.0                   bit_1.1-12                
[53] KernSmooth_2.23-15         stringi_1.1.2              Rcpp_0.12.9                scatterplot3d_0.3-38

 

 

ADD COMMENTlink modified 9 months ago by Gordon Smyth32k • written 9 months ago by natalia.pietrosemoli0

I was just trying a shortcut to typing my whole code. Which is here. And it works, my question still remains the same.

runTest <- function(iset,name, final.table, grp){
  c(grep(grp[1], colnames(final.table),ignore.case = F ), grep(grp[2], colnames(final.table),ignore.case = F))->curr.ind
  exp.mat.0<-final.table[,curr.ind]
  dummy_ind <- seq(from=1, to =dim(exp.mat.0)[1])
  rownames(exp.mat.0)<- paste0(final.table$Entrez_IDs,"_",dummy_ind)  # because we have non-unique values
 
  # remove NA rows
  complete.cases(exp.mat.0)->a
  exp.mat.0 <- exp.mat.0[a,]
 
  Group=as.factor(c(rep(grp[1], length(grep(grp[1], colnames(final.table),ignore.case = F ))),
          rep(grp[2], length(grep(grp[2], colnames(final.table),ignore.case = F )))))
  des = model.matrix(~0+Group)
  colnames(des) <- levels(Group)
 
  contr <- makeContrasts(Cond3-Cond1, levels = des)
  str_split(rownames(exp.mat.0), pattern = "_") ->k
  unlist(lapply(k,function(x) x[1]))->k
 
  ids2indices(iset,k)-> indices
  camera(exp.mat.0, indices, des, contrast=2)-> current.cam  #.up.Q
  camera(exp.mat.0, indices, des, contrast=1)-> current.cam
 
  entrez.ids<- lapply(indices, function(x) k[x])
  symbol.ids <- lapply(entrez.ids, function(x) match(x, final.table$Entrez_IDs))
  col.id <- which(names(final.table) == "SYMBOL")
  symbol.names <- lapply(symbol.ids, function(x) paste(as.character(final.table[unlist(x),col.id]), collapse = ";"))
 
  genes.cam <-vector()
  match(rownames(current.cam),names(symbol.names))->b
  genes.cam <-unlist(symbol.names[b])
 
  cbind(current.cam, genes.cam)->current.camera.tot
 
  res <-list( "camera"= current.camera.tot)
 
  write.table(cbind(rownames(current.camera.tot),current.camera.tot), file=paste0(outdir,name,".",grp[2],"vs",grp[1],".camera.txt"), row.names = FALSE, sep = "\t", col.names =
                c("geneSet",colnames(current.camera.tot)))  
  return(res)
}

load(paste0(gene.sets.dir,"mouse_c2_v5p1.rdata"))
load(paste0(gene.sets.dir,"mouse_H_v5p1.rdata"))
isets <- list()
isets[[1]] <- Mm.H
isets[[2]] <- Mm.c2
names(isets) <- c("H","C2")

# get enrichments
enrichment.H <- list()
enrichment.C2 <- list()

for (i in 1:dim(comb.all)[2]){
 
  runTest(isets[[1]], "H", mat, c(comb.all[2*i-1],comb.all[2*i]))->enrichment.H[[i]]  
  #runTest(isets[[2]], "C2", mat, c(comb.all[2*i-1],comb.all[2*i]))->enrichment.C2[[i]]
 
  names(enrichment.H)[[i]] <-  paste0(comb.all[2*i],".vs.",comb.all[2*i-1])
  #names(enrichment.C2)[[i]] <-  paste0(comb.all[2*i-1],".vs.",comb.all[2*i])
 
}

ADD REPLYlink written 9 months ago by natalia.pietrosemoli0

My answer remains the same. Did you read it?

ADD REPLYlink written 9 months ago by James W. MacDonald45k

If I''ve previously defined my design:

> des
  Mock_3h Z_3h
1       1    0
2       1    0
3       1    0
4       0    1
5       0    1
6       0    1

wouldn't this instruction:

camera(exp.mat.0, indices, des, contrast=2)

specifiy that I run my test for my condition Z_3h?

this is how I interpret the contrast parameter.

"Can be an integer specifying a column of 'design'..."

 

ADD REPLYlink modified 9 months ago by Gordon Smyth32k • written 9 months ago by natalia.pietrosemoli0

What James is saying is that since your design does not have an intercept term (ie. you are using a "cell means" model), you need to test a contrast among the columns of your design matrix by specifying a vector of coefficients which is as long as the number of columns of your design matrix, and not just a single column (index) into our design matrix.

In your case, this call:

camera(exp.mat.0, indices, des, contrast=2)

is essentially testing something you don't really care to answer, ie. you are asking if the mean expression of the genes is different from zero in your "Z_3h" group.

You probably would rather ask if the mean expression is different between your two groups, in which case you need to pass the vector that specifies your contrast across the columns of your design (as mentioned in the docs), so something like:

camera(exp.mat.0, indices, des, contrast=c(-1, 1))

Which is testing the difference in expression calculated by your Z_3h - Mock_3h groups, given the design you specified in your comment above.

Putting it a third way: when James said camera knows nothing about your contrast matrix, he means that it doesn't know that the "2" in your contrast=2 is referencing the second column of a contrast matrix you built somewhere else (again, beating a dead horse: that contrast matrix has as many rows as you have columns in your design matrix). You need to pass the vector that is the second column of your contrast matrix directly.

Make sense?

ADD REPLYlink written 9 months ago by Steve Lianoglou12k
0
gravatar for James W. MacDonald
9 months ago by
United States
James W. MacDonald45k wrote:

It's probably more useful if you show the actual code you used, rather than just re-typing things. For example, this line:

camera(final.table,, indices, des, contrast=1)-> current.cam.up.Cond1 

won't actually run, due to the double commas after final.table. Plus, while the assignment operator is bi-directional, it's easier for people to decipher your code if you follow the convention of right to left assignment (<-) rather than randomly switching back and forth in your code.

Also do note that camera doesn't use (or even know about) your contrast matrix, so specifying contrast = 1 or contrast = 2 isn't a useful thing to do when you use a cell means model. In that situation the statistic you are using is testing that the expression of each gene is different from zero, rather than different between the two groups. This should be clear from the help page:

contrast: contrast of the linear model coefficients for which the test
          is required. Can be an integer specifying a column of
          'design', or else a numeric vector of same length as the
          number of columns of 'design'.
ADD COMMENTlink written 9 months ago by James W. MacDonald45k
0
gravatar for Gordon Smyth
9 months ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

You could use

camera(final.table, indices, des, contrast=c(-1,1))

But why not make it even easier:

des <- model.matrix(~group)
camera(final.table, indices, des)
ADD COMMENTlink modified 9 months ago • written 9 months ago by Gordon Smyth32k

Thank you guys for the answers, took some time, but finally got it!

 

ADD REPLYlink written 9 months ago by natalia.pietrosemoli0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 201 users visited in the last hour