450K annotation: discrepancy between GEO GPL and Bioconductor annotation

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 9.5 years ago

United States

Hi Tom, Tim is right about using bimaps. Bimaps were invented to mimic the behavior of R environments that were originally aimed at supporting expression arrays. If you really insist on using the bimaps, you could use the toggleProbes() method he described to "unhide" your mappings. This method was added to help with situations like this one (where people really wanted to use probes that were mapping to multiple IDs). Or (and I think this is probably an even better option for you) you could just use the new select interface to extract these things. Select doesn't have to play these games since the legacy code that expected the more restrictive behavior was written before we implemented select. This freed us to do things a bit more universally in it's implementation. You can learn more about the new select interface here: http://www.bioconductor.org/packages/2.11/bioc/vignettes/AnnotationDbi /inst/doc/IntroToAnnotationPackages.pdf Hope this helps, Marc On 05/16/2012 03:59 PM, Tim Triche, Jr. wrote: > toggleProbes() masks values where a probe is annotated to multiple > transcripts as 'NONE' or 'NA' by default. Unfortunately, many (thousands) > of the 450k probes are mapped to multiple transcripts in the manifest, and > by default, the automatically generated bimap objects will treat them as if > they were (degenerate) expression probes, masking them. > > I am attempting to address this by replacing the 450k.db, 27k.db, and > 450kprobe packages with a faster, smaller, FeatureDb-based omnibus package > that keeps track of the minimal information required to mask probes, > annotate regions of interest, and process IDAT files, with all other > operations (distance to TSS, chromosome, GC%, etc.) delegated to > GenomicRanges and GenomicFeatures. In my experience this makes much more > sense than using a framework that was originally created for expression > probes. I didn't realize the difference when I first packaged the > annotations into a SQLite database, which is why the 450k.db package uses > the db0 machinery. > > Apologies for the confusion; hopefully this will be a memory as soon as I > am up to speed on creating FeatureDb objects. > > > --t > > On Wed, May 16, 2012 at 12:04 PM, Bartlett, Thomas< > thomas.bartlett.10 at ucl.ac.uk> wrote: > >> Hi, >> >> I've noticed a discrepancy between the chromosome information given for >> some of the probes of the Illumina Infinium 450K array in the GEO GPL info, >> and in the corresponding Bioconductor annotation package. >> >> The first four probes on the 450K GPL summary page on the GEO website >> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL13534 >> in the 'data table' are cg00035864, cg00050873, cg00061679 and cg00063477, >> and the corresponding value in the CHR column is Y for all four of these. >> However, in the corresponding Bioconductor annotation package >> IlluminaHumanMethylation450k.db, using IlluminaHumanMethylation450kCHR the >> chromosome for these same 4 probes is given as Y, NONE, NONE and Y, >> respectively. N.B., the values in the MAPINFO column of 'data table' and >> those found using IlluminaHumanMethylation450kCPGCOORDINATE are identical >> for these 4 probes. >> >> Is there any reason why there is this discrepancy, and might it be more >> widespread? >> >> Thanks in advance for your help >> >> Tom Bartlett >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >

Annotation db0 probe annotate PROcess GenomicFeatures GenomicRanges Annotation db0 probe • 1.9k views

ADD COMMENT • link 13.3 years ago Marc Carlson ★ 7.2k

Login before adding your answer.