Annotation information for ecoli2.db
2
0
Entering edit mode
cporter • 0
@cporter-7549
Last seen 9.0 years ago
United States

Hello, 

I have been working with the ecoli2.db to analyze Affymetrix arrays for E. coli K12 MG1655. Our samples were run on the E. coli 2.0 Affy array, which has probes for 4 strains of E. coli (K12 plus 3 pathogenic strains).

I have been comparing the Entrez IDs present in the ecoli2.db package to those that appear to be specific to K12 in the Affy annotation file, and I'm not able to formulate a perfect alignment. My goal is to create a list of K12-specific probe IDs so that I can pull out array data for just the K12 strain. I have not found specific disagreements between ecoli2.db and the Affy annotation file (e.g., a probe ID linking to different Entrez IDs), but the list of probe IDs from ecoli2.db does not match up with any kind of K12-specific list I generate from the Affy annotation file - the lists are off by a couple hundred genes (e.g., some genes are included in only one list).

My questions are: 1) Is ecoli2.db specific to the K12 probes from the affy array? and 2) Where were the annotations collected from and/or why might they differ from the Affy annotation file? 

Thanks so much for any insights, 

Caroline

ecoli2.db • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

The package is specific for the K-12 strain, which is sort of stated when you load:

> library(ecoli2.db)
Loading required package: org.EcK12.eg.db

ecoli2.db is providing annotations for only one of the species that are
  supported by this platform. You may want to get other annotations
  from other sources/packages in order to cover all the species that
  are represented by probes on this platform.

This could be a little more specific, stating that the species [sic] being supported by the platform is the K12 strain, but I think it is fair to infer that this is so, given that we are also loading the required org.EcK12.eg.db package.

Seems to me that the help pages were more explicit about the source, but these days you get something like

ecoli2ENTREZID            package:ecoli2.db            R Documentation

Map between Manufacturer Identifiers and Entrez Gene

Description:

     ecoli2ENTREZID is an R object that provides mappings between
     manufacturer identifiers and Entrez Gene identifiers.

Details:

     Each manufacturer identifier is mapped to a vector of Entrez Gene
     identifiers. An ‘NA’ is assigned to those manufacturer identifiers
     that can not be mapped to an Entrez Gene identifier at this time.

     If a given manufacturer identifier can be mapped to different
     Entrez Gene identifiers from various sources, we attempt to select
     the common identifiers. If a concensus cannot be determined, we
     select the smallest identifier.

     Mappings were based on data provided by: Entrez Gene
     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA With a date stamp from the
     source of: 2014-Sep19

The primary mapping is based on the probeset -> Entrez Gene ID (or sometimes probeset -> GeneBank/RefSeq IDs) that can be found in the Affymetrix annotation file. Once the probeset -> Entrez Gene ID mapping is extracted from that, the remaining mappings are based on data from NCBI (Entrez Gene ID -> whatever).

ADD COMMENT
0
Entering edit mode
cporter • 0
@cporter-7549
Last seen 9.0 years ago
United States

Thank you very much for this confirmation. I did notice that the ecoli2.db package depended on the org.EcK12.eg.db package, but I wasn't sure if ecoli2.db might have been expanded beyond the basic K12 information to include the other strains from the array. It's helpful to know that is not the case here.

Thanks also for the explanation of how the mapping was made - after some more time thinking about it, I was actually able to recreate the ecoli2.db Entrez mapping by taking the intersect of E. coli K12 Entrez IDs from NCBI and all of the Entrez IDs listed in the Affy annotation file. I'm glad that what I did more or less lines up with the ecoli2.db source info. 


Thank you again!

ADD COMMENT

Login before adding your answer.

Traffic: 907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6