There isn't a standard way that I am familiar with. But this
illustrates a conceptual difference between the purpose of these
and what people end up using them for.
I have run headlong into this issue lately, trying to create
packages for the new 2.X ST arrays. The annotations for these arrays
are primarily directed towards the _transcripts_ that a given probeset
measures, rather than the underlying gene. So the data we get from
these arrays are supposed to represent the relative abundance of a
given transcript, and the 'duplicate' probesets on the array are
supposed to measure transcript variants (at least I assume this is in
general true, as the new TAC software is supposed to work with Gene ST
We know that there actually are transcript variants for various genes,
and that these variants may give rise to phenotypic differences. So it
may well be interesting to measure these variants and try to figure
if they have a meaningful effect on a phenotype we might be interested
However, 100% of the researchers I come into contact with are
completely uninterested in such things, and just want to know if there
are differences in expression at the _gene_ level. This is true BTW
RNA-Seq as well. This may have more to do with the crowd I run with,
rather that the general desires of the average biologist, so I may
be suffering from confirmation bias here.
But I think it is a bit ironic that Affymetrix keeps trying to push
transcript level data on us (Exon arrays, Gene ST arrays, now HTA
arrays), and we push back just as hard, collapsing all these data to
gene level. I am not sure if this is a lack of imagination on our part
or a failure to understand the customer on Affy's part. Or maybe it's
just that I don't hang with the cool kids.
On Friday, October 04, 2013 1:29:36 PM, Joao Sollari Lopes wrote:
> Hi Jim,
> Following on the discussion on annotation in Affymetrix Gene ST
> arrays, I wonder if there is a standard way to deal with multiple
> mRNAs (from different genes) that are assigned to the same
> cluster. Is it generally accepted to follow the naive approach of
> picking the first mrna of the list.
> I know that the mRNA Assignments are ordered in a ranking so is it
> safe just to assume the ranking already performed by Affymetrix?
> On 08/29/2013 04:22 PM, James W. MacDonald wrote:
>> Hi Joao,
>> Unfortunately there are no readily available packages for
>> all the new model organism arrays from Affy. However, the functions
>> to create your own annotation package do exist. If you look at the
>> AnnotationForge package, specifically the SQLForge vignette
>> it is pretty straightforward to make your own annotation package.
>> I am assuming you are summarizing at the transcript level, so would
>> want to make a zebgene11sttranscriptcluster.db package. For this
>> need the transcript csv file from Affy
>> From this you want to generate a two-column file with the probeset
>> in the first column, and then GenBank or RefSeq IDs in the second.
>> This is the tough part, as the annotation files need to be parsed
>> create this file.
>> I wrote an Rscript to parse these files that you could use. It is
>> pretty naive, but seems to do a relatively reasonable job. You will
>> obviously need to change the first line to point to the correct
>> directory, and will have to have the org.Dr.eg.db package
>> but this should
>> <copy from="" below="">
>> args <- commandArgs(TRUE)
>> if(length(args) < 3) stop(paste("Usage: parseAffyTranscripts.R
>> <transcript.csv> <organism.db package="">