Question: How to get list of E.coli genes wit their details on Genome?
gravatar for vinod.acear
2.8 years ago by
vinod.acear30 wrote:

Hi, Is there any way to extract a list (grange) of all genes of E.Coli with their details like location, strand  type, name,.

ADD COMMENTlink modified 2.8 years ago by Vincent J. Carey, Jr.6.2k • written 2.8 years ago by vinod.acear30
gravatar for Vincent J. Carey, Jr.
2.8 years ago by
United States
Vincent J. Carey, Jr.6.2k wrote:

There is a complication owing to the fact that research data on E. coli is typically organized by strain.  Once you've picked a strain among those listed at the ensembl bacterial genomes collection

you can download one, e.g.,

and then use rtracklayer to import

> ec1 = import("Escherichia_coli_k_12.GCA_000800765.1.30.gff3.gz")

> head(ec1,2)

GRanges object with 2 ranges and 19 metadata columns:

        seqnames     ranges strand |   source       type     score     phase

           <Rle>  <IRanges>  <Rle> | <factor>   <factor> <numeric> <integer>

  [1] Chromosome [190, 255]      + |      ena       gene      <NA>      <NA>

  [2] Chromosome [190, 255]      + |      ena transcript      <NA>      <NA>

                       ID        Name        biotype               description

              <character> <character>    <character>               <character>

  [1]    gene:ER3413_4519        thrL protein_coding thr operon leader peptide

  [2] transcript:AIZ54182      thrL-1 protein_coding                      <NA>

          gene_id  logic_name     version           Parent transcript_id

      <character> <character> <character>  <CharacterList>   <character>

  [1] ER3413_4519         ena           1                           <NA>

  [2]        <NA>        <NA>           1 gene:ER3413_4519      AIZ54182

      constitutive ensembl_end_phase ensembl_phase     exon_id        rank

       <character>       <character>   <character> <character> <character>

  [1]         <NA>              <NA>          <NA>        <NA>        <NA>

  [2]         <NA>              <NA>          <NA>        <NA>        <NA>



  [1]        <NA>

  [2]        <NA>


  seqinfo: 1 sequence from an unspecified genome; no seqlengths


This should get you going.  To move further in the direction of an OrganismDb resource for this organism, consider the material about halfway down in

where the steps required to make Sac.cer3 are reviewed.  In this case you would unite the coordinate information acquired from ensembl with GO and the ENTREZ information in ....

ADD COMMENTlink written 2.8 years ago by Vincent J. Carey, Jr.6.2k

Thanks Vincent, it worked for me. 

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by vinod.acear30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 356 users visited in the last hour