Hi everyone,
I'm confused with the results of my CAMERA analysis. For building indexes, I used the battery of gene sets from MSigDb. I transformed the gmt files to list and built indexes. The initial count matrixes contained hgnc symbols as row names, which include protein-coding genes, as well as lncRNA, miRNA and etc. names. MSigDb allows users to download two types of sets: entrez ids and hgnc symbols. When I transform symbols to entrez and build indexes the result completely differs from the case when I use symbols for building indexes
Example with symbols to build indexes
camera_res1[1:5,]
NGenes Direction PValue FDR
RESPONSE_OF_EIF2AK4_GCN2_TO_AMINO_ACID_DEFICIENCY 98 Up 7.358697e-12 2.267950e-08
KEGG_RIBOSOME 85 Up 2.507014e-11 3.863308e-08
EUKARYOTIC_TRANSLATION_ELONGATION 90 Up 2.086557e-10 1.862321e-07
SELENOAMINO_ACID_METABOLISM 105 Up 2.417029e-10 1.862321e-07
CELLULAR_RESPONSE_TO_STARVATION 149 Up 1.376982e-09 8.487719e-07
Example with entrez ids for indexing
camera_res2[1:5,]
NGenes Direction PValue FDR
SIGNALLING_TO_RAS 20 Down 0.000617136 0.9982597
PLASMA_LIPOPROTEIN_REMODELING 19 Down 0.001480280 0.9982597
ACTIVATION_OF_TRKA_RECEPTORS 2 Down 0.002682050 0.9982597
WP_MEVALONATE_ARM_OF_CHOLESTEROL_BIOSYNTHESIS_PATHWAY 12 Up 0.003880270 0.9982597
NABA_ECM_AFFILIATED 85 Down 0.003973647 0.9982597
In the first case(with symbols) I had a larger list of pathways (3082), in the second it was 3041. What result is more relevant? Do non-protein-coding RNAs play such a crucial role in pathway significance?
Dear James,
thank you for the reply. Please, find my code below
Thank you for the suggestion about Entrez ID. Although, with such approach, I do not see any pathway with FDR < 0.1
If you do this
The result will be a
list
of IDs. Andids2indices
won't work as you expect. The second argument for that function isAnd a
list
is not a character vector. If you don't have NCBI IDs in your 'genes'data.frame
, then you should use symbols.