Question

Battery gene sets for CAMERA limma

0

Entering edit mode

brov.olia • 0

@e1fb1374

Last seen 13 months ago

Germany

Hi everyone,

I'm confused with the results of my CAMERA analysis. For building indexes, I used the battery of gene sets from MSigDb. I transformed the gmt files to list and built indexes. The initial count matrixes contained hgnc symbols as row names, which include protein-coding genes, as well as lncRNA, miRNA and etc. names. MSigDb allows users to download two types of sets: entrez ids and hgnc symbols. When I transform symbols to entrez and build indexes the result completely differs from the case when I use symbols for building indexes

Example with symbols to build indexes

camera_res1[1:5,]

                                                  NGenes Direction       PValue          FDR
RESPONSE_OF_EIF2AK4_GCN2_TO_AMINO_ACID_DEFICIENCY     98        Up 7.358697e-12 2.267950e-08
KEGG_RIBOSOME                                         85        Up 2.507014e-11 3.863308e-08
EUKARYOTIC_TRANSLATION_ELONGATION                     90        Up 2.086557e-10 1.862321e-07
SELENOAMINO_ACID_METABOLISM                          105        Up 2.417029e-10 1.862321e-07
CELLULAR_RESPONSE_TO_STARVATION                      149        Up 1.376982e-09 8.487719e-07

Example with entrez ids for indexing

camera_res2[1:5,]


                                                      NGenes Direction      PValue       FDR
SIGNALLING_TO_RAS                                         20      Down 0.000617136 0.9982597
PLASMA_LIPOPROTEIN_REMODELING                             19      Down 0.001480280 0.9982597
ACTIVATION_OF_TRKA_RECEPTORS                               2      Down 0.002682050 0.9982597
WP_MEVALONATE_ARM_OF_CHOLESTEROL_BIOSYNTHESIS_PATHWAY     12        Up 0.003880270 0.9982597
NABA_ECM_AFFILIATED                                       85      Down 0.003973647 0.9982597

In the first case(with symbols) I had a larger list of pathways (3082), in the second it was 3041. What result is more relevant? Do non-protein-coding RNAs play such a crucial role in pathway significance?

CAMERA • 587 views

ADD COMMENT • link updated 13 months ago by James W. MacDonald 65k • written 13 months ago by brov.olia • 0

score 0 · Answer 1 · 2023-03-06

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Without providing code, it's not possible to say exactly why there are differences. That said, if you want the most accurate results you should use NCBI IDs (what used to be called Entrez Gene IDs) rather than symbols, as the gene IDs are way more likely to uniquely identify a given gene.

ADD COMMENT • link 13 months ago James W. MacDonald 65k

0

Entering edit mode

Dear James,

thank you for the reply. Please, find my code below

#download geneset with symbol and entrez
hs.c2.cp.l <- gmt_to_list("Msig_entrez/c2.cp.v2022.1.Hs.entrez.gmt", cutoff = 0,
                               sep = "\thttp://www.gsea-msigdb.org/gsea/msigdb/human/geneset/.*?\t")
hs.c2.cp.symb.l <- gmt_to_list("Msig_symbols/c2.cp.v2023.1.Hs.symbols.gmt", cutoff = 0,
                      sep = "\thttp://www.gsea-msigdb.org/gsea/msigdb/human/geneset/.*?\t")
#transform symbols  to entrezID and create indexes
my_entrez<-mget(voom_out$genes$symbol, org.Hs.egSYMBOL2EG,ifnotfound=NA)    
entrez_ind <-  ids2indices(hs.c2.cp.l, my_entrez)
symbol_ind <- ids2indices(hs.c2.cp.symb.l, voom_out$genes$symbol)
#run CAMERA
camera_res1<- camera(voom_out$E, index = symbol_ind,
         weights = voom_out$weights,
         design = mydesign, contrast =mycontrast)

camera_res2<- camera(voom_out$E, index = entrez_ind,
             weights = voom_out$weights,
             design = mydesign, contrast =mycontrast)

Thank you for the suggestion about Entrez ID. Although, with such approach, I do not see any pathway with FDR < 0.1

ADD REPLY • link 13 months ago brov.olia • 0

0

Entering edit mode

If you do this

my_entrez<-mget(voom_out$genes$symbol, org.Hs.egSYMBOL2EG,ifnotfound=NA)

The result will be a list of IDs. And ids2indices won't work as you expect. The second argument for that function is

Arguments:

gene.sets: list of character vectors, each vector containing the gene
          identifiers for a set of genes.

identifiers: character vector of gene identifiers.

And a list is not a character vector. If you don't have NCBI IDs in your 'genes' data.frame, then you should use symbols.

ADD REPLY • link 13 months ago James W. MacDonald 65k