Question

How to get list of E.coli genes wit their details on Genome?

0

Entering edit mode

vinod.acear ▴ 50

@vinodacear-8884

Last seen 4.4 years ago

India

Hi, Is there any way to extract a list (grange) of all genes of E.Coli with their details like location, strand type, name,.

biomart maketxdbfrombiomart granges • 2.3k views

ADD COMMENT • link updated 8.8 years ago by Vincent J. Carey, Jr. 6.7k • written 8.8 years ago by vinod.acear ▴ 50

score 2 · Accepted Answer · 2016-03-12

There is a complication owing to the fact that research data on E. coli is typically organized by strain. Once you've picked a strain among those listed at the ensembl bacterial genomes collection

http://bacteria.ensembl.org/index.html

you can download one, e.g.,

ftp://ftp.ensemblgenomes.org/pub/bacteria/release-30/gff3/bacteria_90_collection/escherichia_coli_k_12/Escherichia_coli_k_12.GCA_000800765.1.30.gff3.gz

and then use rtracklayer to import

> ec1 = import("Escherichia_coli_k_12.GCA_000800765.1.30.gff3.gz")

> head(ec1,2)

GRanges object with 2 ranges and 19 metadata columns:

        seqnames     ranges strand |   source       type     score     phase

           <Rle>  <IRanges>  <Rle> | <factor>   <factor> <numeric> <integer>

  [1] Chromosome [190, 255]      + |      ena       gene      <NA>      <NA>

  [2] Chromosome [190, 255]      + |      ena transcript      <NA>      <NA>

                       ID        Name        biotype               description

              <character> <character>    <character>               <character>

  [1]    gene:ER3413_4519        thrL protein_coding thr operon leader peptide

  [2] transcript:AIZ54182      thrL-1 protein_coding                      <NA>

          gene_id  logic_name     version           Parent transcript_id

      <character> <character> <character>  <CharacterList>   <character>

  [1] ER3413_4519         ena           1                           <NA>

  [2]        <NA>        <NA>           1 gene:ER3413_4519      AIZ54182

      constitutive ensembl_end_phase ensembl_phase     exon_id        rank

       <character>       <character>   <character> <character> <character>

  [1]         <NA>              <NA>          <NA>        <NA>        <NA>

  [2]         <NA>              <NA>          <NA>        <NA>        <NA>

       protein_id

      <character>

  [1]        <NA>

  [2]        <NA>

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

This should get you going. To move further in the direction of an OrganismDb resource for this organism, consider the material about halfway down in

http://genomicsclass.github.io/book/pages/bioc2_rpacks.html

where the steps required to make Sac.cer3 are reviewed. In this case you would unite the coordinate information acquired from ensembl with GO and the ENTREZ information in org.Ec.eg.db ....