How to get list of E.coli genes wit their details on Genome?
Entering edit mode
vinod.acear ▴ 50
Last seen 14 months ago

Hi, Is there any way to extract a list (grange) of all genes of E.Coli with their details like location, strand  type, name,.

biomart maketxdbfrombiomart granges • 1.3k views
Entering edit mode
Last seen 10 days ago
United States

There is a complication owing to the fact that research data on E. coli is typically organized by strain.  Once you've picked a strain among those listed at the ensembl bacterial genomes collection

you can download one, e.g.,

and then use rtracklayer to import

> ec1 = import("Escherichia_coli_k_12.GCA_000800765.1.30.gff3.gz")

> head(ec1,2)

GRanges object with 2 ranges and 19 metadata columns:

        seqnames     ranges strand |   source       type     score     phase

           <Rle>  <IRanges>  <Rle> | <factor>   <factor> <numeric> <integer>

  [1] Chromosome [190, 255]      + |      ena       gene      <NA>      <NA>

  [2] Chromosome [190, 255]      + |      ena transcript      <NA>      <NA>

                       ID        Name        biotype               description

              <character> <character>    <character>               <character>

  [1]    gene:ER3413_4519        thrL protein_coding thr operon leader peptide

  [2] transcript:AIZ54182      thrL-1 protein_coding                      <NA>

          gene_id  logic_name     version           Parent transcript_id

      <character> <character> <character>  <CharacterList>   <character>

  [1] ER3413_4519         ena           1                           <NA>

  [2]        <NA>        <NA>           1 gene:ER3413_4519      AIZ54182

      constitutive ensembl_end_phase ensembl_phase     exon_id        rank

       <character>       <character>   <character> <character> <character>

  [1]         <NA>              <NA>          <NA>        <NA>        <NA>

  [2]         <NA>              <NA>          <NA>        <NA>        <NA>



  [1]        <NA>

  [2]        <NA>


  seqinfo: 1 sequence from an unspecified genome; no seqlengths


This should get you going.  To move further in the direction of an OrganismDb resource for this organism, consider the material about halfway down in

where the steps required to make Sac.cer3 are reviewed.  In this case you would unite the coordinate information acquired from ensembl with GO and the ENTREZ information in ....

Entering edit mode

Thanks Vincent, it worked for me. 


Login before adding your answer.

Traffic: 185 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6