Hi there
I have been using the R package illuminaHumanv4.db to annotate our HT12 v4 array probes, and I have two questions:
* The description of the package says the data is assembled from public repositories. However, the reference manual notes that extensive reannotation has been carried out for the illumina probes. Am I right in thinking that the reannotation (ie genomic location, EnsemblReannotated ids etc) are from the paper:
A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data
http://nar.oxfordjournals.org/content/38/3/e17/F1.expansion.html
If not, can someone shed some light as to where this reannotation is coming from / citation of how it was redone?
* I am looking to map the probe IDs to ensembl transcript names, not just the gene names. The package doesn't have this information (only ensembl gene names). Could I obtain this somewhere? Perhaps information on the first point might help .. I know ensembl has these, but if the annotation differs, I can't go this route ..
Many thanks!
Vicky

Thanks Efstathios,
I didn't notice that, I mostly looked at the documentation. Would you happen to know what the difference between ENSEMBL and EnsemblReannotated ( illuminaHumanv4ENSEMBLREANNOTATED) is?
I would assume that the former is actually directly derived from the ensembl annotation, while the latter from a custom reannotation (and for which I cannot find transcript names). There are quite a few discrepancies between the two Ensembl gene names above.
A quick glance at the top few, the ENSEMBL names are not exactly the same as the ones returned by Biomart, either ..
Thanks,Vicky
You mean you tried to use different annotations from the above options as "columns" ? Well, im not sure about your assumpion, as in my case i mostly used gene symbols and Entrez IDs(and in my naive opinion, i believe that are enough). On the other hand, if for your specific experimental design you need in particular ensembl annotations, this is another thing
Hi Efstathios
There are various HT12 re-annotations that are getting published all the time, so the challenge is to find one that is most reliable. For the illuminaHumanv4.db package, I am trying to identify what this reannotation is, and how it differs to something like standard ensembl. I am unclear which columns correspond to which reannotation (and where this comes from), so I can't come to any conclusions ..
Have you considered looking at the help pages? Does ?illuminaHumanv4ENSEMBLREANNOTATED not answer your questions?
Hi James,
Yes, that is where I obtained the information in my original post (point 1) above. I hadn't noticed the ENSEMBL only annotation (mentioned by Efstathios), which clarifies my assumption.
I still cannot find transcript ids for the re-annotated piepline (EnsemblReannotated, etc), though. Maybe it's plainly obvious and I am just not seeing it?
I don't think there are any transcript IDs annotated in that package, and given that the probes are 50-mers, I sort of doubt many of them can be inferred to be transcript-specific anyway. But do note that the package does give the re-mapped genomic locations.
> z <- illuminaHumanv4fullReannotation() > z[5000:5010,] IlluminaID ArrayAddress NuID ProbeQuality CodingZone 5000 ILMN_1824016 2190088 rXqf5ofj79Tcp.4Xu0 Perfect*** Transcriptomic? 5001 ILMN_1709092 3990441 61RbnpngeHtZ5QqC4Q Perfect Transcriptomic 5002 ILMN_2321292 1410075 ZoF6LQAgefS1WexZeU Bad Transcriptomic 5003 ILMN_2324998 5910138 EiJOOub_JtLCYiAoaY Perfect Transcriptomic 5004 ILMN_1662334 6560445 Tl3nrJO2S.U7v0o32o Perfect Transcriptomic 5005 ILMN_1715417 4810468 Z6OkUinkgpQg0JyClE Perfect Transcriptomic 5006 ILMN_1795218 1340193 Nnhs.TpLQtQ6SbfbWo Perfect Transcriptomic 5007 ILMN_2289093 7400743 KVb81O7U_GlNzvn32k Bad Transcriptomic 5008 ILMN_1821127 7550600 l1JAYAnj_uQYfoOVUo Bad Transcriptomic? 5009 ILMN_1739751 7040647 oRXtR_rjT1IVdVATkw Perfect Transcriptomic 5010 ILMN_1674650 1980180 rdXdSomTIxJInRLi0g Perfect Transcriptomic ProbeSequence SecondMatches 5000 CCTGGGCTTTGCGGACTTGATTGTTTCCATCTAGGCTTTTGACCTGTGTC <NA> 5001 TCCCACCGTGCTGGCGCTGAACTGACTGTCCGCTGCCAAGGGAAGTGACA <NA> 5002 GGAACCTGGAGTCAAAAAGAACTGCTTCAGTCCCCGCTGTACCGCCTGCC <NA> 5003 GAGAGCATGATGGTGCGTTTGAGCGTCAGTAAGCGAGAGAAAGGACGGCG <NA> 5004 GCCTCTGCTGGTAGCATGTCGCAGTTTCCATGTGTTTCAGGATCTTCGGG <NA> 5005 TGGATGGCACCAGAGGCTGCAGAAGGCCAAGAATCAAGCTAGAAGGCCAC <NA> 5006 GCTGACGTATTTCATGGCAGTCAAGTCCAATGGCAGCGTCTTCGTCCGGG <NA> 5007 CCCCGTTTATCCATGTGTCCATTGACGGCCATCTATGTTGCTTCTTCGGC <NA> 5008 TCCAGCAAACGAAAAGCTGATTTGGTGCAACGACTTGGAATGCCCCCAGG <NA> 5009 CACCCTGTCCACTTGGGTGATCATTCCAGACCCCTCCCCAAACATGCATA <NA> 5010 CTCCCTCTCCAGGGAGCGCATAGATACAGCAGAGCTCACAGTGAGTCAGA <NA> OtherGenomicMatches RepeatMask OverlappingSNP 5000 <NA> <NA> <NA> 5001 <NA> <NA> <NA> 5002 <NA> MIRb_SINE_MIR:50 rs114937162 5003 <NA> <NA> <NA> 5004 <NA> <NA> <NA> 5005 <NA> <NA> rs111784512 5006 <NA> <NA> <NA> 5007 <NA> L1MB7_LINE_L1:50 rs79267010 5008 <NA> <NA> rs12902628 5009 <NA> <NA> rs116742961 5010 <NA> <NA> rs111428370 rs117366822 EntrezReannotated GenomicLocation SymbolReannotated 5000 <NA> chrX:48365282:48365331:- BG119374 5001 150000 chr21:15646319:15646368:+ ABCC13 5002 26100 chr7:5273232:5273281:+ WIPI2 5003 25983 chr14:23946422:23946471:+ NGDN 5004 9093 chr16:4506559:4506608:+ DNAJA3 5005 6403 chr1:169558180:169558229:- SELP 5006 22907 chr3:47891147:47891196:+ DHX30 5007 57674 chr17:78295191:78295240:+ RNF213 5008 <NA> chr15:59392374:59392423:+ CK905457 5009 284129 chr17:78227104:78227153:+ SLC26A11 5010 54981 chr9:77676212:77676261:- C9orf95 ReporterGroupName ReporterGroupID EnsemblReannotated 5000 <NA> <NA> ENSG00000224292 5001 <NA> <NA> ENSG00000243064 5002 <NA> <NA> ENSG00000157954 5003 <NA> <NA> ENSG00000129460 5004 <NA> <NA> ENSG00000103423 5005 <NA> <NA> ENSG00000174175 5006 <NA> <NA> ENSG00000132153 5007 <NA> <NA> ENSG00000173821 5008 <NA> <NA> <NA> 5009 <NA> <NA> ENSG00000181045 5010 <NA> <NA> ENSG00000106733And you could pretty easily create a GRanges with those data, and then use
findOverlaps()on thetranscripts()from a TxDb that you could get by runningmakeTxDbFromBiomart(), to decide which transcript(s) a given probe will bind to.Hi James,
Thank you for the tips, I thought that might be too time consuming, but your suggestions should get me there faster!