Entering edit mode
Hi Tom,
Tim is right about using bimaps. Bimaps were invented to mimic the
behavior of R environments that were originally aimed at supporting
expression arrays.
If you really insist on using the bimaps, you could use the
toggleProbes() method he described to "unhide" your mappings. This
method was added to help with situations like this one (where people
really wanted to use probes that were mapping to multiple IDs).
Or (and I think this is probably an even better option for you) you
could just use the new select interface to extract these things.
Select
doesn't have to play these games since the legacy code that expected
the
more restrictive behavior was written before we implemented select.
This freed us to do things a bit more universally in it's
implementation. You can learn more about the new select interface
here:
http://www.bioconductor.org/packages/2.11/bioc/vignettes/AnnotationDbi
/inst/doc/IntroToAnnotationPackages.pdf
Hope this helps,
Marc
On 05/16/2012 03:59 PM, Tim Triche, Jr. wrote:
> toggleProbes() masks values where a probe is annotated to multiple
> transcripts as 'NONE' or 'NA' by default. Unfortunately, many
(thousands)
> of the 450k probes are mapped to multiple transcripts in the
manifest, and
> by default, the automatically generated bimap objects will treat
them as if
> they were (degenerate) expression probes, masking them.
>
> I am attempting to address this by replacing the 450k.db, 27k.db,
and
> 450kprobe packages with a faster, smaller, FeatureDb-based omnibus
package
> that keeps track of the minimal information required to mask probes,
> annotate regions of interest, and process IDAT files, with all other
> operations (distance to TSS, chromosome, GC%, etc.) delegated to
> GenomicRanges and GenomicFeatures. In my experience this makes much
more
> sense than using a framework that was originally created for
expression
> probes. I didn't realize the difference when I first packaged the
> annotations into a SQLite database, which is why the 450k.db package
uses
> the db0 machinery.
>
> Apologies for the confusion; hopefully this will be a memory as soon
as I
> am up to speed on creating FeatureDb objects.
>
>
> --t
>
> On Wed, May 16, 2012 at 12:04 PM, Bartlett, Thomas<
> thomas.bartlett.10 at ucl.ac.uk> wrote:
>
>> Hi,
>>
>> I've noticed a discrepancy between the chromosome information given
for
>> some of the probes of the Illumina Infinium 450K array in the GEO
GPL info,
>> and in the corresponding Bioconductor annotation package.
>>
>> The first four probes on the 450K GPL summary page on the GEO
website
>> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL13534
>> in the 'data table' are cg00035864, cg00050873, cg00061679 and
cg00063477,
>> and the corresponding value in the CHR column is Y for all four of
these.
>> However, in the corresponding Bioconductor annotation package
>> IlluminaHumanMethylation450k.db, using
IlluminaHumanMethylation450kCHR the
>> chromosome for these same 4 probes is given as Y, NONE, NONE and Y,
>> respectively. N.B., the values in the MAPINFO column of 'data
table' and
>> those found using IlluminaHumanMethylation450kCPGCOORDINATE are
identical
>> for these 4 probes.
>>
>> Is there any reason why there is this discrepancy, and might it be
more
>> widespread?
>>
>> Thanks in advance for your help
>>
>> Tom Bartlett
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>