Question

EnsemblGene ID conversion

0

Entering edit mode

trumbia • 0

@trumbia-21415

Last seen 4.8 years ago

Hello;

I have some IDs my result of DESeq2 however I do not know what are they also want to convert them to ensembl gene id. Is there any package to check what are they and convert them to ensemblygene id ? Thanks CBL14485 CBL14486 CBL14487 CBL14488 CBL14489 CBL14490 CBL14491 CBL14492 CBL14493

Those are some of the IDs .

deseq2 annotation Tutorial • 1.2k views

ADD COMMENT • link updated 4.7 years ago by Martin Morgan 25k • written 4.7 years ago by trumbia • 0

score 0 · Answer 1 · 2019-08-16

0

Entering edit mode

thokall ▴ 160

@thokall-14310

Last seen 8 weeks ago

Swedish Museum of Natural History

Hi,

The information you need should be able to extract from the section "Using select with EnsDb packages" in the vignette to the package "AnnotationDbi".

ADD COMMENT • link 4.7 years ago thokall ▴ 160

0

Entering edit mode

my organism is Ruminococcus bromii. Do you know any library about it?

ADD REPLY • link 4.7 years ago trumbia • 0

0

Entering edit mode

It might be available otherwise one can generate one from the information at available at ensembl.

See here. My experience is mostly from eukaryotes so I am not sure what the case is for bacteria

ADD REPLY • link 4.7 years ago thokall ▴ 160

0

Entering edit mode

It does not work for this bacteria. Do you know anything about pseudomonas aureus to convert ensembly gene id? Also id s like EOT21830. Thanks

ADD REPLY • link 4.7 years ago trumbia • 0

0

Entering edit mode

> library(rtracklayer)
> z <- import("ftp://ftp.ensemblgenomes.org/pub/release-44/bacteria//gtf/bacteria_21_collection/ruminococcus_bromii_l2_63/Ruminococcus_bromii_l2_63.ASM20987v1.44.gtf.gz")

> z
GRanges object with 11138 ranges and 15 metadata columns:
          seqnames          ranges strand |   source        type     score
             <Rle>       <IRanges>  <Rle> | <factor>    <factor> <numeric>
      [1] FP929051        336-1046      + |      ena        gene      <NA>
      [2] FP929051        336-1046      + |      ena  transcript      <NA>
      [3] FP929051        336-1046      + |      ena        exon      <NA>
      [4] FP929051        336-1043      + |      ena         CDS      <NA>
      [5] FP929051         336-338      + |      ena start_codon      <NA>
      ...      ...             ...    ... .      ...         ...       ...
  [11134] FP929051 2248856-2248990      - |      ena  transcript      <NA>
  [11135] FP929051 2248856-2248990      - |      ena        exon      <NA>
  [11136] FP929051 2248859-2248990      - |      ena         CDS      <NA>
  [11137] FP929051 2248988-2248990      - |      ena start_codon      <NA>
  [11138] FP929051 2248856-2248858      - |      ena  stop_codon      <NA>
              phase     gene_id gene_source   gene_biotype transcript_id
          <integer> <character> <character>    <character>   <character>
      [1]      <NA>   RBR_00100         ena protein_coding          <NA>
      [2]      <NA>   RBR_00100         ena protein_coding      CBL14483
      [3]      <NA>   RBR_00100         ena protein_coding      CBL14483
      [4]         0   RBR_00100         ena protein_coding      CBL14483
      [5]         0   RBR_00100         ena protein_coding      CBL14483
      ...       ...         ...         ...            ...           ...
  [11134]      <NA>   RBR_21880         ena protein_coding      CBL16293
  [11135]      <NA>   RBR_21880         ena protein_coding      CBL16293
  [11136]         0   RBR_21880         ena protein_coding      CBL16293
  [11137]         0   RBR_21880         ena protein_coding      CBL16293
  [11138]         0   RBR_21880         ena protein_coding      CBL16293
          transcript_source transcript_biotype exon_number     exon_id
                <character>        <character> <character> <character>
      [1]              <NA>               <NA>        <NA>        <NA>
      [2]               ena     protein_coding        <NA>        <NA>
      [3]               ena     protein_coding           1  CBL14483-1
      [4]               ena     protein_coding           1        <NA>
      [5]               ena     protein_coding           1        <NA>
      ...               ...                ...         ...         ...
  [11134]               ena     protein_coding        <NA>        <NA>
  [11135]               ena     protein_coding           1  CBL16293-1
  [11136]               ena     protein_coding           1        <NA>
  [11137]               ena     protein_coding           1        <NA>
  [11138]               ena     protein_coding           1        <NA>
           protein_id   gene_name transcript_name
          <character> <character>     <character>
      [1]        <NA>        <NA>            <NA>
      [2]        <NA>        <NA>            <NA>
      [3]        <NA>        <NA>            <NA>
      [4]    CBL14483        <NA>            <NA>
      [5]        <NA>        <NA>            <NA>
      ...         ...         ...             ...
  [11134]        <NA>        <NA>            <NA>
  [11135]        <NA>        <NA>            <NA>
  [11136]    CBL16293        <NA>            <NA>
  [11137]        <NA>        <NA>            <NA>
  [11138]        <NA>        <NA>            <NA>
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> zz <- c("CBL14485", "CBL14486", "CBL14487", "CBL14488", "CBL14489", "CBL14490", "CBL14491", "CBL14492", "CBL14493")
> zzz <- subset(mcols(z)[,c(5,8)], mcols(z)$transcript_id %in% zz)
> zzz[!duplicated(zzz[,1]),]
DataFrame with 9 rows and 2 columns
      gene_id transcript_id
  <character>   <character>
1   RBR_00120      CBL14485
2   RBR_00130      CBL14486
3   RBR_00140      CBL14487
4   RBR_00150      CBL14488
5   RBR_00160      CBL14489
6   RBR_00170      CBL14490
7   RBR_00180      CBL14491
8   RBR_00190      CBL14492
9   RBR_00200      CBL14493
>

ADD REPLY • link 4.7 years ago James W. MacDonald 65k