Entering edit mode
Mark Cowley
▴
910
@mark-cowley-2951
Last seen 10.2 years ago
Dear list,
I've read the illuminaHumanv4.db.pdf, and it's not clear to me how the
mappings are built. From the short package description, I thought the
RefSeq ID's from the illumina array manifest would be used, but
according to the pdf manual, I think its ACCNUM, but we're not told
from where the ACCNUM is derived (from ?illuminaHumanv4ACCNUM: "For
chip packages such as this, the ACCNUM mapping comes directly from the
manufacturer.").
I raise the question, since within the illuminaHuman4SYMBOL table,
there are no probes for the CTNND1 gene, whereas according to the
manifest file, there are 5 probes that should map to that gene:
from the manifest:
$ grep -w CTNND1 HumanHT-12_V4_0_R2_15002873_B.txt | cut -f3,6,5,14
#Search_Key ILMN_Gene RefSeq_ID Symbol
XM_943087.1 CTNND1 XM_943087.1 ILMN_1651944
XM_937008.1 CTNND1 XM_937008.1 ILMN_1807510
XM_943098.1 CTNND1 NM_001085458.1 ILMN_1696806
XM_943098.1 CTNND1 XM_943098.1 ILMN_1663159
NM_001331.1 CTNND1 NM_001331.1 ILMN_2293511
# from the illuminaHumanv4.db package
require(illuminaHumanv4.db)
> ids <- c("ILMN_1651944", "ILMN_1807510", "ILMN_1696806",
"ILMN_1663159", "ILMN_2293511")
> unlist(mget(ids, illuminaHumanv4SYMBOL))
ILMN_1651944 ILMN_1807510 ILMN_1696806 ILMN_1663159 ILMN_2293511
NA NA NA NA NA
> unlist(mget(ids, illuminaHumanv4REFSEQ))
ILMN_1651944 ILMN_1807510 ILMN_1696806 ILMN_1663159 ILMN_2293511
NA NA NA NA NA
# why are there no REFSEQID's for these probes?
> mget(ids, illuminaHumanv4ACCNUM)
$ILMN_1651944
[1] NA
$ILMN_1807510
[1] NA
$ILMN_1696806
[1] "NM_001085458" "NM_001085459" "NM_001085460" "NM_001085461"
"NM_001085462"
[6] "NM_001085463" "NM_001085464" "NM_001085465" "NM_001085466"
"NM_001085467"
[11] "NM_001085468" "NM_001085469" "NM_001331" "NR_037646"
$ILMN_1663159
[1] NA
$ILMN_2293511
[1] "NM_001085458" "NM_001085459" "NM_001085460" "NM_001085461"
"NM_001085462"
[6] "NM_001085463" "NM_001085464" "NM_001085465" "NM_001085466"
"NM_001085467"
[11] "NM_001085468" "NM_001085469" "NM_001331" "NR_037646"
# all of these RefSeq ID's correspond to Entrez Gene ID 1500, CTNND1
catenin (cadherin-associated protein), delta 1 [ Homo sapiens ]
# why do 3 probes not have an ACCNUM?
If I BLAST all 5 probes, the 3 probes with NA in the ACCNUM (see
above) all align to NG_029078.1 (=CTNND1), but not to NM_001331
(=CTNND1), and the 2 probes with lots of ACCNUM ID's align to both
NG_029078.1 and NM_001331 amongst many others.
mget(ids, illuminaHumanv4PROBESEQUENCE)
>ILMN_1651944 -> NG_029078.1
GAAGGACCCTCCCCCGCTTCATAGTTTATGAATGCGAGAGTTGGTAAGGG
>ILMN_1807510 -> NG_029078.1
CGGTCATTCTCTGCCATCCCTAGAAAGAATGTCCAATCCACTGCCTTTGT
>ILMN_1696806 -> NG_029078.1, NM_001331, many others
GACCATCCCAAAAAGGAAGTGCACCTTGGAGCCTGTGGAGCTCTCAAGAA
>ILMN_1663159 -> NG_029078.1
GCCTATTCTTTAGCCTCCATTCCTATCTGTATTGCATACTGTAACTCCAA
>ILMN_2293511 -> NG_029078.1, NM_001331, many others
ATCCAGACTTTGGGTCGTGATTTCCGCAAGAATGGCAATGGGGGACCTGG
I'd really love to get to the bottom of this, as the R annotation
packages are very rich, but missing ID's make it hard to know whether
they're better than the manufacturers manifest files.
cheers,
Mark
-----------------------------------------------------
Mark Cowley, PhD
Pancreatic Cancer Program | Peter Wills Bioinformatics Centre
Garvan Institute of Medical Research, Sydney, Australia
-----------------------------------------------------
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] graphics datasets grDevices utils grid stats
methods
[8] base
other attached packages:
[1] illuminaHumanv4.db_1.10.0 org.Hs.eg.db_2.5.0
[3] RSQLite_0.9-4 DBI_0.2-5
[5] AnnotationDbi_1.14.1 limma_3.8.3
[7] mjcdev_1.0 Cairo_1.4-9
[9] metaGSEA_1.0.2 pwbc_1.0.3
[11] lumidat_1.0.1 lumi_2.4.0
[13] nleqslv_1.8.6 updateR_1.0.4
[15] roxygen_0.1-3 digest_0.5.0
[17] codetools_0.2-8 haselst_0.1
[19] blat_0.1 genomics_0.1
[21] mjcbase_0.1 GEOquery_2.19.2
[23] cor_0.1 xtable_1.5-6
[25] rgl_0.92.798 qvalue_1.26.0
[27] igraph_0.5.5-2 graph_1.30.0
[29] XML_3.4-2 SparseM_0.89
[31] Biobase_2.12.2 sos_1.3-1
[33] brew_1.0-6 gplots_2.8.0
[35] caTools_1.12 bitops_1.0-4.1
[37] gdata_2.8.1 gtools_2.6.2
loaded via a namespace (and not attached):
[1] affy_1.30.0 affyio_1.20.0 annotate_1.30.0
[4] hdrcde_2.15 KernSmooth_2.23-6 lattice_0.19-30
[7] MASS_7.3-13 Matrix_0.999375-50 methylumi_1.8.0
[10] mgcv_1.7-6 nlme_3.1-101 preprocessCore_1.14.0
[13] RCurl_1.6-7 tcltk_2.13.1 tools_2.13.1
[[alternative HTML version deleted]]