Hi there
I have been using the R package illuminaHumanv4.db to annotate our HT12 v4 array probes, and I have two questions:
* The description of the package says the data is assembled from public repositories. However, the reference manual notes that extensive reannotation has been carried out for the illumina probes. Am I right in thinking that the reannotation (ie genomic location, EnsemblReannotated ids etc) are from the paper:
A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data
http://nar.oxfordjournals.org/content/38/3/e17/F1.expansion.html
If not, can someone shed some light as to where this reannotation is coming from / citation of how it was redone?
* I am looking to map the probe IDs to ensembl transcript names, not just the gene names. The package doesn't have this information (only ensembl gene names). Could I obtain this somewhere? Perhaps information on the first point might help .. I know ensembl has these, but if the annotation differs, I can't go this route ..
Many thanks!
Vicky
Thanks Efstathios,
I didn't notice that, I mostly looked at the documentation. Would you happen to know what the difference between ENSEMBL and EnsemblReannotated ( illuminaHumanv4ENSEMBLREANNOTATED) is?
I would assume that the former is actually directly derived from the ensembl annotation, while the latter from a custom reannotation (and for which I cannot find transcript names). There are quite a few discrepancies between the two Ensembl gene names above.
A quick glance at the top few, the ENSEMBL names are not exactly the same as the ones returned by Biomart, either ..
Thanks,Vicky
You mean you tried to use different annotations from the above options as "columns" ? Well, im not sure about your assumpion, as in my case i mostly used gene symbols and Entrez IDs(and in my naive opinion, i believe that are enough). On the other hand, if for your specific experimental design you need in particular ensembl annotations, this is another thing
Hi Efstathios
There are various HT12 re-annotations that are getting published all the time, so the challenge is to find one that is most reliable. For the illuminaHumanv4.db package, I am trying to identify what this reannotation is, and how it differs to something like standard ensembl. I am unclear which columns correspond to which reannotation (and where this comes from), so I can't come to any conclusions ..
Have you considered looking at the help pages? Does ?illuminaHumanv4ENSEMBLREANNOTATED not answer your questions?
Hi James,
Yes, that is where I obtained the information in my original post (point 1) above. I hadn't noticed the ENSEMBL only annotation (mentioned by Efstathios), which clarifies my assumption.
I still cannot find transcript ids for the re-annotated piepline (EnsemblReannotated, etc), though. Maybe it's plainly obvious and I am just not seeing it?
I don't think there are any transcript IDs annotated in that package, and given that the probes are 50-mers, I sort of doubt many of them can be inferred to be transcript-specific anyway. But do note that the package does give the re-mapped genomic locations.
And you could pretty easily create a GRanges with those data, and then use
findOverlaps()
on thetranscripts()
from a TxDb that you could get by runningmakeTxDbFromBiomart()
, to decide which transcript(s) a given probe will bind to.Hi James,
Thank you for the tips, I thought that might be too time consuming, but your suggestions should get me there faster!