Inconsistency in illuminaHumanv4.db ?
0
0
Entering edit mode
Mark Dunning ★ 1.1k
@mark-dunning-3319
Last seen 14 months ago
Sheffield, Uk
Hi Holger, Apologies for the delay in replying to you about this. The annotation packages I provide are different to most other Bioconductor annotation packages in that we have attempted to re-map the manufacturer probes to the genome and transcriptome. The issues that you have highlighted are due to our re-annotation and not with the building of the Biocondutor package. In most cases there should only be a single genomic location, and no space-separated values. I've checked the examples where we have space-separated values in the genomic location. There are 11 such cases in the 106343 unique human probes across all chips we create annotations for. In every case, this is where the annotation script has been unable to find a genomic location from the BLAST searches against the transcriptomic sequence databases or the reference genome. It then has a last-gasp attempt at getting a location through a BLAT search against the genome and these cases has found multiple equal best scoring hits. There is a real problem with this BLAT search in that it only gets called right at the end after the annotations (SNPs, repeats, etc.) and for those probes the quality is given as "No match". This will be corrected in future packages. The BLAT searches are run against hg19 and are correct for the examples I've just been looking at. I think the problem may be that the BLAT hits are for partial alignments and in some cases only a small section of the probe is aligning. An example is ILMN_1773455 which is given the genomic locations chr1:149906516:149906531:+ chr1:185572746:185572761:+. These correspond to two BLAT hits as follows for only 16 of the 50 bases: >chr1 Length = 249250621 Score = 31 bits (80), Expect = 1e+00 Identities = 16/16 (100%) Strand = Plus / Plus Query: 31 atgaagaagaacagtg 46 |||||||||||||||| Sbjct: 149906516 atgaagaagaacagtg 149906531 Score = 31 bits (80), Expect = 1e+00 Identities = 16/16 (100%) Strand = Plus / Plus Query: 21 tggaaatgctatgaag 36 |||||||||||||||| Sbjct: 185572746 tggaaatgctatgaag 185572761 Hope this helps, Mark On Wed, Nov 30, 2011 at 12:39 AM, Holger [guest] <guest at="" bioconductor.org=""> wrote: > > I am using illuminaHumanv4.db for my research, so first of all, ?thank you for maintaining this very valuable package! > > > When working with the illuminaHumanv4listNewMappings, I realised that some genomic coordinates are separated with " " instead of ",". Almost all other multiple entries are separated with a ",". Additionlly, genomic position of those entries does not seem to match with ucsc hg19 browser: > > require(illuminaHumanv4.db) > test <- illuminaHumanv4fullReannotation() > str(test) > grep(" ", test$GenomicLocation, value=T) > [1] "chr9:70645819:70645844:+ chr9:68298969:68298994:+ chr9:42251008:42251033:+ chr9:45442520:45442545:+" > [2] "chr19:53832784:53832812:+ chr19:53268654:53268682:+" > [3] "chr7:142008849:142008868:+ chr1:161139601:161139616:+" > [4] "chr7:142008849:142008868:+ chr1:161139601:161139616:+" > [5] "chrX:71034915:71034941:+ chrX:70888863:70888889:+ chrX:70885403:70885429:+" > [6] "chrX:71034915:71034941:+ chrX:70888863:70888889:+ chrX:70885403:70885429:+" > [7] "chr22:18979472:18979497:+ chrX:70888863:70888889:+ chrX:70885403:70885429:+" > [8] "chr1:149906516:149906531:+ chr1:185572746:185572761:+" > [9] "chr1:176811981:176811996:+ chr1:161139601:161139616:+" > > Is there any specific reason for this? > > When looking on illuminaHumanv4.db_1.10.0, other probes were effected, but the problem appeared to be present, too. > > ?-- output of sessionInfo(): > > R version 2.14.0 (2011-10-31) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=German_Germany.1252 ?LC_CTYPE=German_Germany.1252 > [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > [5] LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] illuminaHumanv4.db_1.12.1 org.Hs.eg.db_2.6.4 > [3] RSQLite_0.10.0 ? ? ? ? ? ?DBI_0.2-5 > [5] AnnotationDbi_1.16.5 ? ? ?Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] IRanges_1.12.3 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Annotation Annotation • 891 views
ADD COMMENT

Login before adding your answer.

Traffic: 695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6