Entering edit mode
Hi Lourdes,
Sorry for taking so long to get back to you. Went away for a few days
and somehow managed to miss your message
Thanks for your interest in the packages! The probe quality scores are
derived from our mapping of probes to the genome and the transcriptome
using an in-house perl script. The *'s indicate issues in
consolidating the genomic and transcriptomic matches. Here is the full
explanation;
"Perfect/Good*** no CDS annotation - this can occur where there all
the
transcript alignment matches are to the reverse strand and/or are
GenBank
entries for which we have no 5pUTR/3pUTR/CDS annotation."
i.e the probe was found to match a transcript, but there is
insufficient information to class it as 3pUTR/5pUTR. The transcript
may be unreliable.
Perfect/Good**** mismatches for transcript alignment to the genome -
mismatches for transcript alignments to the genome are taken from the
UCSC
annotations tables refSeqAli and all_mrna; **** is attached to the
probe
quality is Perfect or Good and the genomics coordinates for the best
match
from a BLAST search against the transcript databases and that from a
BLAST
search against the reference genome differ and there is a mismatch in
the
transcript alignment to the genome.
i,e the probe matches a transcript, but the transcript does not map to
the genomic location that we expect.
The missing Probe Quality values for those probes are accidental. The
source file I use to compile the annotation packages is as follows
grep ILMN_1229593 Annotation_Illumina_Mouse_WG-6_V2_mm9_Sept2011.txt
ILMN_1229593 AACTGGCCCACCTTCAACACTCCCTCTAGGCACCCAGACCTCTAGTGGCA
50 chr15:63942585:63942634:- 15qD1 0
1-50
|||||||||||||||||||||||||||||||||||||||||||||||||| 50 100
100 NM_010026 1
of 1 (Asap1) uc007vzk.1 uc007vzj.1 uc007vzi.1 uc007vzh.1 4 of 6
(Asap1) BC094581 BC048818 BC002201 AK122477 AF075461 AK147689 6 of
381
(Asap1)6 X 6 6 6 6 6 6 6 6 7 ENSMUST00000110115 ENSMUST00000023008
2
of 3 (ENSMUSG00000022377) 65301463 63101607 28981428 12805456
28972685
4063613 74188670 NP_034156.2 Q9QWY8 Q9QWY8
No 1-50
|||||||||||||||||||||||||||||||||||||||||||||||||| 50 100
100 U92478 1-50
|||||||||||||||||||||||||
|||||||||||||||||||||||| 50 98 98 Asap1
ENSMUSG00000022377 Mm.27723613196 ArfGAP
with SH# domain, ankyrin repeat and PH
domain1 Yes Transcriptomic Yes 58 0
Perfect 006280286
grep ILMN_2694153 Annotation_Illumina_Mouse_WG-6_V2_mm9_Sept2011.txt
ILMN_2694153 GTTTAGATGAGTGGGTTTGTACATCTTATGGCGAGTGGCCACCCCTGAGA
50 chr15:63920345:63920394:- 15qD1 0
1-50
|||||||||||||||||||||||||||||||||||||||||||||||||| 50 100
100 NM_010026 1
of 1 (Asap1) uc007vzm.1 uc007vzl.1 uc007vzk.1 uc007vzj.1 uc007vzi.1
uc007vzh.1 6 of 6 (Asap1) U92478 BC094581 BC048818 BC002201
AK122477
AF075462 AF075461 AK166056 AK159048 AK146545 BB821218 AK147689 11 of
381 (Asap1) 1 1 1 X 1 1 1 1 1 1 1 1 1 1 X 1 X X 1
ENSMUST00000110114
ENSMUST00000110115 ENSMUST00000023008 3 of 3
(ENSMUSG00000022377) 65301463 1928965 63101607 28981428 12805456
28972685 4063615 4063613 74141548 74186632 74138896 16993847
74188670 NP_034156.2 Q9QWY8 Q9QWY8 Q9QWY8
Q9QWY8 No 1-50
|||||||||||||||||||||||||||||||||||||||||||||||||| 50 100
100 Asap1
ENSMUSG00000022377 Mm.27723613196 ArfGAP
with SH# domain, ankyrin repeat and PH
domain1 Yes Transcriptomic Yes 50 0
Perfect 001010528
There is a # character in the description and by default R thinks that
everything that follows is a comment and so doesn't read them in. I
shall correct this in future versions of the annotation. Thanks for
spotting this. Both probes are Perfect btw.
Regards,
Mark
On Tue, Feb 14, 2012 at 7:01 PM, Lourdes Pe?a Castillo
<lourdes.pena at="" gmail.com=""> wrote:
> Hello,
>
> I am using the re-annotation of Illumina probe sequences available
in the
> ?IlluminaMousev2.db (great package!), and I have two questions
(please see
> code below as well):
>
> 1) Is there any difference between Good and Good*** or Perfect and
> Perfect**** probe quality?
>
> 2) I noticed there are two probes re-annotated to an EntrezID
without probe
> quality, why would this be?
>
> Thanks!
>
> Lourdes
>
>> library("illuminaMousev2.db")
>
>> x <- illuminaMousev2ENTREZREANNOTATED
>
>> mapped_probes <- mappedkeys(x)
>
>> xx <- as.list(x[mapped_probes])
>
>> probe_EntrezID_re <- unlist(xx)
>
>>
>
>> x <- illuminaMousev2PROBEQUALITY
>
>> mapped_probes <- mappedkeys(x)
>
>> # Convert to a list
>
>> xx <- as.list(x[mapped_probes])
>
>> probe_quality_re <- unlist(xx)
>
>>
>
>> table(probe_quality_re[intersect(names(probe_EntrezID_re),
> names(probe_quality_re))])
>
>
> ? ? ? ?Bad ? ? ? ?Good ? ? Good*** ? ?Good**** ? ?No match ? ?
Perfect
> ?Perfect*** Perfect****
>
> ? ? ? 3657 ? ? ? ? 996 ? ? ? ? ?38 ? ? ? ? 302 ? ? ? ? ?79 ? ? ?
31819
> ? 1719 ? ? ? ?1047
>
>>
>
>> setdiff(names(probe_EntrezID_re), names(probe_quality_re))
>
> [1] "ILMN_1229593" "ILMN_2694153"
>
>> probe_quality_re[c("ILMN_1229593", "ILMN_2694153")]
>
> <na> <na>
>
> ?NA ? NA
>
>>
>
>> sessionInfo()
>
> R version 2.14.1 (2011-12-22)
>
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
>
> locale:
>
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
>
> attached base packages:
>
> [1] grid ? ? ?stats ? ? graphics ?grDevices utils ? ? datasets
?methods
> base
>
>
> other attached packages:
>
> ?[1] gplots_2.10.1 ? ? ? ? ? ? KernSmooth_2.23-7 ? ? ? ?
caTools_1.12
> ? ? ? bitops_1.0-4.1
>
> ?[5] gdata_2.8.2 ? ? ? ? ? ? ? gtools_2.6.2 ? ? ? ? ? ?
?limma_3.10.2
> ? ? ? illuminaMousev2.db_1.12.1
>
> ?[9] org.Mm.eg.db_2.6.4 ? ? ? ?RSQLite_0.11.1 ? ? ? ? ? ?DBI_0.2-5
> ? ? ? ?AnnotationDbi_1.16.15
>
> [13] Biobase_2.14.0 ? ? ? ? ? ?BiocInstaller_1.2.1
>
>
> loaded via a namespace (and not attached):
>
> [1] IRanges_1.12.6 tools_2.14.1
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor