What are these genes ? [How to get Ensembl IDs for them]
Entering edit mode
prabin.dm • 0
Last seen 8 months ago
United States

Hi, I need to get the ensembl ID for the genes in my dataset. I believe I have gene symbols, but I can not figure out what are these gene names and how do I convert them to ensembl IDs.

> dge[[1]][grep("-",x = dge[[1]]$Gene),]
# A tibble: 25,989 × 5
   Gene          baseMean pvalue              padj  FoldChange         
   <chr>            <dbl> <chr>               <chr> <chr>              
 1 RP24-342J3.4     0.101 0.92788346559582402 NA    -1.3769054206920972
 2 CH36-217G15.3    0     NA                  NA    NA                 
 3 RP23-280C13.3    0     NA                  NA    NA                 
 4 RP24-235N21.3    0     NA                  NA    NA                 
 5 RP23-434H18.4    0     NA                  NA    NA                 
 6 RP23-4K22.1      0     NA                  NA    NA                 
 7 RP24-117F20.3    0     NA                  NA    NA                 
 8 RP23-363J15.4    0     NA                  NA    NA                 
 9 RP23-112D14.1    0     NA                  NA    NA                 
10 RP23-132K20.4    0.672 0.80928984414463001 NA    -1.7080714557114776
# … with 25,979 more rows

I have tried using AnnotationDb as well as biomart assuming these are symbols. But clearly they are not.

dge2 <- dge %>% map(
                    "ensemble_gene_id" = mapIds(org.Mm.eg.db,
                                              key = Gene, keytype = "SYMBOL", 
                                              column = "ENSEMBL",
                                              multiVals = "first")) 

> dge2[[1]][grep("-",x = dge2[[1]]$Gene),]
# A tibble: 25,989 × 6
   Gene        baseMean pvalue           padj  FoldChange       ensemble_gene_id
   <chr>          <dbl> <chr>            <chr> <chr>            <chr>           
 1 RP24-342J3…    0.101 0.9278834655958… NA    -1.376905420692… NA              
 2 CH36-217G1…    0     NA               NA    NA               NA              
 3 RP23-280C1…    0     NA               NA    NA               NA              
 4 RP24-235N2…    0     NA               NA    NA               NA              
 5 RP23-434H1…    0     NA               NA    NA               NA              
 6 RP23-4K22.1    0     NA               NA    NA               NA              
 7 RP24-117F2…    0     NA               NA    NA               NA              
 8 RP23-363J1…    0     NA               NA    NA               NA              
 9 RP23-112D1…    0     NA               NA    NA               NA              
10 RP23-132K2…    0.672 0.8092898441446… NA    -1.708071455711… NA 

Any suggestions will be appreciated.

AnnotationDbi biomaRt • 943 views
Entering edit mode

I guess these are TPF sequences. But surely you must know where you got this dataset from and hence what the data refers to??

Entering edit mode

Mouse genes are not typically all capitals, but I was able to find a reference to a mouse gene in NCBI to CH36-217G15. Maybe this is not Ensembl data? Without knowing how the data was generated, it's going to be very difficult to determine what these gene symbols mean.


Login before adding your answer.

Traffic: 409 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6