What are these genes ? [How to get Ensembl IDs for them]
0
0
Entering edit mode
prabin.dm • 0
@prabindm-9986
Last seen 11 months ago
United States

Hi, I need to get the ensembl ID for the genes in my dataset. I believe I have gene symbols, but I can not figure out what are these gene names and how do I convert them to ensembl IDs.

> dge[[1]][grep("-",x = dge[[1]]$Gene),]
# A tibble: 25,989 × 5
   Gene          baseMean pvalue              padj  FoldChange         
   <chr>            <dbl> <chr>               <chr> <chr>              
 1 RP24-342J3.4     0.101 0.92788346559582402 NA    -1.3769054206920972
 2 CH36-217G15.3    0     NA                  NA    NA                 
 3 RP23-280C13.3    0     NA                  NA    NA                 
 4 RP24-235N21.3    0     NA                  NA    NA                 
 5 RP23-434H18.4    0     NA                  NA    NA                 
 6 RP23-4K22.1      0     NA                  NA    NA                 
 7 RP24-117F20.3    0     NA                  NA    NA                 
 8 RP23-363J15.4    0     NA                  NA    NA                 
 9 RP23-112D14.1    0     NA                  NA    NA                 
10 RP23-132K20.4    0.672 0.80928984414463001 NA    -1.7080714557114776
# … with 25,979 more rows

I have tried using AnnotationDb as well as biomart assuming these are symbols. But clearly they are not.

dge2 <- dge %>% map(
            mutate,
                    "ensemble_gene_id" = mapIds(org.Mm.eg.db,
                                              key = Gene, keytype = "SYMBOL", 
                                              column = "ENSEMBL",
                                              multiVals = "first")) 

> dge2[[1]][grep("-",x = dge2[[1]]$Gene),]
# A tibble: 25,989 × 6
   Gene        baseMean pvalue           padj  FoldChange       ensemble_gene_id
   <chr>          <dbl> <chr>            <chr> <chr>            <chr>           
 1 RP24-342J3…    0.101 0.9278834655958… NA    -1.376905420692… NA              
 2 CH36-217G1…    0     NA               NA    NA               NA              
 3 RP23-280C1…    0     NA               NA    NA               NA              
 4 RP24-235N2…    0     NA               NA    NA               NA              
 5 RP23-434H1…    0     NA               NA    NA               NA              
 6 RP23-4K22.1    0     NA               NA    NA               NA              
 7 RP24-117F2…    0     NA               NA    NA               NA              
 8 RP23-363J1…    0     NA               NA    NA               NA              
 9 RP23-112D1…    0     NA               NA    NA               NA              
10 RP23-132K2…    0.672 0.8092898441446… NA    -1.708071455711… NA 

Any suggestions will be appreciated.

AnnotationDbi biomaRt • 1.0k views
ADD COMMENT
1
Entering edit mode

I guess these are TPF sequences. But surely you must know where you got this dataset from and hence what the data refers to??

ADD REPLY
0
Entering edit mode

Mouse genes are not typically all capitals, but I was able to find a reference to a mouse gene in NCBI to CH36-217G15. Maybe this is not Ensembl data? Without knowing how the data was generated, it's going to be very difficult to determine what these gene symbols mean.

ADD REPLY

Login before adding your answer.

Traffic: 672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6