I'm confused with the results of my CAMERA analysis. For building indexes, I used the battery of gene sets from MSigDb. I transformed the gmt files to list and built indexes. The initial count matrixes contained hgnc symbols as row names, which include protein-coding genes, as well as lncRNA, miRNA and etc. names. MSigDb allows users to download two types of sets: entrez ids and hgnc symbols. When I transform symbols to entrez and build indexes the result completely differs from the case when I use symbols for building indexes
Example with symbols to build indexes
camera_res1[1:5,] NGenes Direction PValue FDR RESPONSE_OF_EIF2AK4_GCN2_TO_AMINO_ACID_DEFICIENCY 98 Up 7.358697e-12 2.267950e-08 KEGG_RIBOSOME 85 Up 2.507014e-11 3.863308e-08 EUKARYOTIC_TRANSLATION_ELONGATION 90 Up 2.086557e-10 1.862321e-07 SELENOAMINO_ACID_METABOLISM 105 Up 2.417029e-10 1.862321e-07 CELLULAR_RESPONSE_TO_STARVATION 149 Up 1.376982e-09 8.487719e-07
Example with entrez ids for indexing
camera_res2[1:5,] NGenes Direction PValue FDR SIGNALLING_TO_RAS 20 Down 0.000617136 0.9982597 PLASMA_LIPOPROTEIN_REMODELING 19 Down 0.001480280 0.9982597 ACTIVATION_OF_TRKA_RECEPTORS 2 Down 0.002682050 0.9982597 WP_MEVALONATE_ARM_OF_CHOLESTEROL_BIOSYNTHESIS_PATHWAY 12 Up 0.003880270 0.9982597 NABA_ECM_AFFILIATED 85 Down 0.003973647 0.9982597
In the first case(with symbols) I had a larger list of pathways (3082), in the second it was 3041. What result is more relevant? Do non-protein-coding RNAs play such a crucial role in pathway significance?
thank you for the reply. Please, find my code below
Thank you for the suggestion about Entrez ID. Although, with such approach, I do not see any pathway with FDR < 0.1
If you do this
The result will be a
listof IDs. And
ids2indiceswon't work as you expect. The second argument for that function is
listis not a character vector. If you don't have NCBI IDs in your 'genes'
data.frame, then you should use symbols.