Dear Bioconductor members
I am asked to analyze my lab's HTA2.0 data once again, but this time to find information on specific transcript variants.
Do you know if it is possible to annotate an Eset obtained via : data.rma = oligo::rma(data, target="probeset")? For what I understand, this apply RMA to the probeset, without summarizing in transcriptclusters. And can we perform DE analysis on transcripts ?
For instance, how do I know which probes will be of interest for this kind of transcript : https://www.ncbi.nlm.nih.gov/nuccore/XM_011529084.2
I read in an another thread (Use junction probesets (JUC) to detect alternative splicing on Affymetrix human HTA 2.0 arrays? that practically no one was interested in transcript analysis from microarray data, explaining the lack of tools to perform this task. Is it really so ? I find this quite surprising.
Thank you for your help,
Paul
Hi again,
So I annotated my dataset just to have a look, and it seems that the hta20probeset.db does not really differentiate between transcripts as it maps all refseq transcripts for one gene to each probes corresponding to this gene.
Returns :
Corresponding to two transcript variants and two precursors
I suppose this is what makes
hta.dat <- annotateEset(hta.dat, hta20probeset.db)
fails (R crashes).Is this normal ? Is there a trick that I missed ? Or is it simply because affymetrix did not give the correct annotations for its probes.
Ah, you know what? The way these annotation packages work is completely anathema to how this array is supposed to work. The ChipDb package only has one real table, which maps the probeset ID to the Entrez Gene ID, and any other lookups will be farmed out to the org.Hs.eg.db package.
Since all previous Affy arrays (with the notable exception of the Exon ST arrays) were based on gene level expression, this was fine. But for these arrays it doesn't really work. For example the JUC probe you mention is only intended to measure two of the four transcripts for CD68:
But when we make the annotation package we map the RefSeq ID to its Entrez Gene ID
And any mapping from that probeset ID to anything else is based on a join to that table, so is by definition gene based, not transcript based.
So as they stand, the probeset.db packages are probably not that useful. OTOH, it's a fairly rare thing for people to want to do what you want to do, so I don't know if it's worth putting limited resources towards making a special ChipDb type that is transcript rather than gene centric.
I should also mention that
annotateEset
works for me, so I cannot reproduce any crash. You also have access to all the annotation that Affy provides, which you can get using thegetNetAffx
function in oligo.Which you can parse to get all the transcripts that each probeset is supposed to interrogate.
But this turns out, for this gene, to be exactly what you already would get from the hta20probeset.db package:
So apparently all of the PSR and JUC probesets for this gene measure just two of the known transcripts? I don't have time to explore further, but this should give you some hints as to how you can proceed on your own.
Thank you very much for these explanations. As I suspected, it had something to do with how the ChipDB package is built.
Thank you for your time,
Paul
I think you missed my point. At least for this particular probeset, we are supplying EXACTLY what Affy supplies, which is the same exact data for all of the probesets. So unless this is an outlier, it's not clear that Affy supply the required data to do any transcript work anyway.
"The ChipDb package only has one real table, which maps the probeset ID to the Entrez Gene ID, and any other lookups will be farmed out to the org.Hs.eg.db package."
This is what I refered to when saying this. As you said, Affymetrix is not providing Transcript detailed information for the ChipDB package of it's HTA chips (but they sell you them telling that its great to look at transcripts). Well, anyway, I cannot do that (which I will explain my colleague). Only solution left is to look at Affymetrix TAC software.