How to get list of E.coli genes wit their details on Genome?
1
0
Entering edit mode
vinod.acear ▴ 50
@vinodacear-8884
Last seen 4.2 years ago
India

Hi, Is there any way to extract a list (grange) of all genes of E.Coli with their details like location, strand  type, name,.

biomart maketxdbfrombiomart granges • 2.2k views
ADD COMMENT
2
Entering edit mode
@vincent-j-carey-jr-4
Last seen 9 weeks ago
United States

There is a complication owing to the fact that research data on E. coli is typically organized by strain.  Once you've picked a strain among those listed at the ensembl bacterial genomes collection

http://bacteria.ensembl.org/index.html

you can download one, e.g.,

ftp://ftp.ensemblgenomes.org/pub/bacteria/release-30/gff3/bacteria_90_collection/escherichia_coli_k_12/Escherichia_coli_k_12.GCA_000800765.1.30.gff3.gz

and then use rtracklayer to import

> ec1 = import("Escherichia_coli_k_12.GCA_000800765.1.30.gff3.gz")

> head(ec1,2)

GRanges object with 2 ranges and 19 metadata columns:

        seqnames     ranges strand |   source       type     score     phase

           <Rle>  <IRanges>  <Rle> | <factor>   <factor> <numeric> <integer>

  [1] Chromosome [190, 255]      + |      ena       gene      <NA>      <NA>

  [2] Chromosome [190, 255]      + |      ena transcript      <NA>      <NA>

                       ID        Name        biotype               description

              <character> <character>    <character>               <character>

  [1]    gene:ER3413_4519        thrL protein_coding thr operon leader peptide

  [2] transcript:AIZ54182      thrL-1 protein_coding                      <NA>

          gene_id  logic_name     version           Parent transcript_id

      <character> <character> <character>  <CharacterList>   <character>

  [1] ER3413_4519         ena           1                           <NA>

  [2]        <NA>        <NA>           1 gene:ER3413_4519      AIZ54182

      constitutive ensembl_end_phase ensembl_phase     exon_id        rank

       <character>       <character>   <character> <character> <character>

  [1]         <NA>              <NA>          <NA>        <NA>        <NA>

  [2]         <NA>              <NA>          <NA>        <NA>        <NA>

       protein_id

      <character>

  [1]        <NA>

  [2]        <NA>

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

This should get you going.  To move further in the direction of an OrganismDb resource for this organism, consider the material about halfway down in

http://genomicsclass.github.io/book/pages/bioc2_rpacks.html

where the steps required to make Sac.cer3 are reviewed.  In this case you would unite the coordinate information acquired from ensembl with GO and the ENTREZ information in org.Ec.eg.db ....

0
Entering edit mode

Thanks Vincent, it worked for me. 

ADD REPLY

Login before adding your answer.

Traffic: 877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6