Search
Question: How to get list of E.coli genes wit their details on Genome?
0
gravatar for vinod.acear
20 months ago by
vinod.acear20
India
vinod.acear20 wrote:

Hi, Is there any way to extract a list (grange) of all genes of E.Coli with their details like location, strand  type, name,.

ADD COMMENTlink modified 20 months ago by Vincent J. Carey, Jr.6.2k • written 20 months ago by vinod.acear20
2
gravatar for Vincent J. Carey, Jr.
20 months ago by
United States
Vincent J. Carey, Jr.6.2k wrote:

There is a complication owing to the fact that research data on E. coli is typically organized by strain.  Once you've picked a strain among those listed at the ensembl bacterial genomes collection

http://bacteria.ensembl.org/index.html

you can download one, e.g.,

ftp://ftp.ensemblgenomes.org/pub/bacteria/release-30/gff3/bacteria_90_collection/escherichia_coli_k_12/Escherichia_coli_k_12.GCA_000800765.1.30.gff3.gz

and then use rtracklayer to import

> ec1 = import("Escherichia_coli_k_12.GCA_000800765.1.30.gff3.gz")

> head(ec1,2)

GRanges object with 2 ranges and 19 metadata columns:

        seqnames     ranges strand |   source       type     score     phase

           <Rle>  <IRanges>  <Rle> | <factor>   <factor> <numeric> <integer>

  [1] Chromosome [190, 255]      + |      ena       gene      <NA>      <NA>

  [2] Chromosome [190, 255]      + |      ena transcript      <NA>      <NA>

                       ID        Name        biotype               description

              <character> <character>    <character>               <character>

  [1]    gene:ER3413_4519        thrL protein_coding thr operon leader peptide

  [2] transcript:AIZ54182      thrL-1 protein_coding                      <NA>

          gene_id  logic_name     version           Parent transcript_id

      <character> <character> <character>  <CharacterList>   <character>

  [1] ER3413_4519         ena           1                           <NA>

  [2]        <NA>        <NA>           1 gene:ER3413_4519      AIZ54182

      constitutive ensembl_end_phase ensembl_phase     exon_id        rank

       <character>       <character>   <character> <character> <character>

  [1]         <NA>              <NA>          <NA>        <NA>        <NA>

  [2]         <NA>              <NA>          <NA>        <NA>        <NA>

       protein_id

      <character>

  [1]        <NA>

  [2]        <NA>

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

This should get you going.  To move further in the direction of an OrganismDb resource for this organism, consider the material about halfway down in

http://genomicsclass.github.io/book/pages/bioc2_rpacks.html

where the steps required to make Sac.cer3 are reviewed.  In this case you would unite the coordinate information acquired from ensembl with GO and the ENTREZ information in org.Ec.eg.db ....

ADD COMMENTlink written 20 months ago by Vincent J. Carey, Jr.6.2k

Thanks Vincent, it worked for me. 

ADD REPLYlink modified 20 months ago • written 20 months ago by vinod.acear20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 285 users visited in the last hour