Question: EnsemblGene ID conversion
0
gravatar for trumbia
3 months ago by
trumbia0
trumbia0 wrote:

Hello;

I have some IDs my result of DESeq2 however I do not know what are they also want to convert them to ensembl gene id. Is there any package to check what are they and convert them to ensemblygene id ? Thanks CBL14485 CBL14486 CBL14487 CBL14488 CBL14489 CBL14490 CBL14491 CBL14492 CBL14493

Those are some of the IDs .

annotation deseq2 tutorial • 112 views
ADD COMMENTlink modified 3 months ago by Martin Morgan ♦♦ 24k • written 3 months ago by trumbia0
Answer: EnsemblGene ID conversion
0
gravatar for thokall
3 months ago by
thokall160
Swedish Museum of Natural History
thokall160 wrote:

Hi,

The information you need should be able to extract from the section "Using select with EnsDb packages" in the vignette to the package "AnnotationDbi".

ADD COMMENTlink written 3 months ago by thokall160

my organism is Ruminococcus bromii. Do you know any library about it?

ADD REPLYlink written 3 months ago by trumbia0

It might be available otherwise one can generate one from the information at available at ensembl.

See here. My experience is mostly from eukaryotes so I am not sure what the case is for bacteria

ADD REPLYlink written 3 months ago by thokall160

It does not work for this bacteria. Do you know anything about pseudomonas aureus to convert ensembly gene id? Also id s like EOT21830. Thanks

ADD REPLYlink written 12 weeks ago by trumbia0
> library(rtracklayer)
> z <- import("ftp://ftp.ensemblgenomes.org/pub/release-44/bacteria//gtf/bacteria_21_collection/ruminococcus_bromii_l2_63/Ruminococcus_bromii_l2_63.ASM20987v1.44.gtf.gz")

> z
GRanges object with 11138 ranges and 15 metadata columns:
          seqnames          ranges strand |   source        type     score
             <Rle>       <IRanges>  <Rle> | <factor>    <factor> <numeric>
      [1] FP929051        336-1046      + |      ena        gene      <NA>
      [2] FP929051        336-1046      + |      ena  transcript      <NA>
      [3] FP929051        336-1046      + |      ena        exon      <NA>
      [4] FP929051        336-1043      + |      ena         CDS      <NA>
      [5] FP929051         336-338      + |      ena start_codon      <NA>
      ...      ...             ...    ... .      ...         ...       ...
  [11134] FP929051 2248856-2248990      - |      ena  transcript      <NA>
  [11135] FP929051 2248856-2248990      - |      ena        exon      <NA>
  [11136] FP929051 2248859-2248990      - |      ena         CDS      <NA>
  [11137] FP929051 2248988-2248990      - |      ena start_codon      <NA>
  [11138] FP929051 2248856-2248858      - |      ena  stop_codon      <NA>
              phase     gene_id gene_source   gene_biotype transcript_id
          <integer> <character> <character>    <character>   <character>
      [1]      <NA>   RBR_00100         ena protein_coding          <NA>
      [2]      <NA>   RBR_00100         ena protein_coding      CBL14483
      [3]      <NA>   RBR_00100         ena protein_coding      CBL14483
      [4]         0   RBR_00100         ena protein_coding      CBL14483
      [5]         0   RBR_00100         ena protein_coding      CBL14483
      ...       ...         ...         ...            ...           ...
  [11134]      <NA>   RBR_21880         ena protein_coding      CBL16293
  [11135]      <NA>   RBR_21880         ena protein_coding      CBL16293
  [11136]         0   RBR_21880         ena protein_coding      CBL16293
  [11137]         0   RBR_21880         ena protein_coding      CBL16293
  [11138]         0   RBR_21880         ena protein_coding      CBL16293
          transcript_source transcript_biotype exon_number     exon_id
                <character>        <character> <character> <character>
      [1]              <NA>               <NA>        <NA>        <NA>
      [2]               ena     protein_coding        <NA>        <NA>
      [3]               ena     protein_coding           1  CBL14483-1
      [4]               ena     protein_coding           1        <NA>
      [5]               ena     protein_coding           1        <NA>
      ...               ...                ...         ...         ...
  [11134]               ena     protein_coding        <NA>        <NA>
  [11135]               ena     protein_coding           1  CBL16293-1
  [11136]               ena     protein_coding           1        <NA>
  [11137]               ena     protein_coding           1        <NA>
  [11138]               ena     protein_coding           1        <NA>
           protein_id   gene_name transcript_name
          <character> <character>     <character>
      [1]        <NA>        <NA>            <NA>
      [2]        <NA>        <NA>            <NA>
      [3]        <NA>        <NA>            <NA>
      [4]    CBL14483        <NA>            <NA>
      [5]        <NA>        <NA>            <NA>
      ...         ...         ...             ...
  [11134]        <NA>        <NA>            <NA>
  [11135]        <NA>        <NA>            <NA>
  [11136]    CBL16293        <NA>            <NA>
  [11137]        <NA>        <NA>            <NA>
  [11138]        <NA>        <NA>            <NA>
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> zz <- c("CBL14485", "CBL14486", "CBL14487", "CBL14488", "CBL14489", "CBL14490", "CBL14491", "CBL14492", "CBL14493")
> zzz <- subset(mcols(z)[,c(5,8)], mcols(z)$transcript_id %in% zz)
> zzz[!duplicated(zzz[,1]),]
DataFrame with 9 rows and 2 columns
      gene_id transcript_id
  <character>   <character>
1   RBR_00120      CBL14485
2   RBR_00130      CBL14486
3   RBR_00140      CBL14487
4   RBR_00150      CBL14488
5   RBR_00160      CBL14489
6   RBR_00170      CBL14490
7   RBR_00180      CBL14491
8   RBR_00190      CBL14492
9   RBR_00200      CBL14493
> 
ADD REPLYlink written 12 weeks ago by James W. MacDonald51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 293 users visited in the last hour