Question: Alternate expression of splice isoforms on Affy Clariom D assay
1
2.8 years ago by
mforde8410
mforde8410 wrote:

Hello,

I have a quick question concerning the analysis of the Affy Clariom D chip, where we are interested in quantifying splice isoforms (e.g., exon skipping, intron retention, alternate 3', alternate 5' events). I've worked with gene-level analyses previously, but not transcript so I assume there are some nuances to this workflow more so than gene-level. I see that there is a transcript annotation package available for the chip, i.e., clariomdhumantranscriptcluster.db. So I'm just curious if mapping probes, then analysis with limma and limma's spliceDiff is sufficient to pick up these types events.

Martin

modified 2.8 years ago by James W. MacDonald51k • written 2.8 years ago by mforde8410
Answer: Alternate expression of splice isoforms on Affy Clariom D assay
3
2.8 years ago by
United States
James W. MacDonald51k wrote:

The transcript annotation package is intended for summarization at the transcript level (e.g., what you are calling the 'gene' level). If you want to look at splice isoforms you would hypothetically summarize at the probeset level, which will summarize probesets that measure either what Affy calls probe set regions (PSRs), or junction probesets, which are intended to give evidence for the existence of exon-exon junctions. In that case you want to annotate using the clariomdhumanprobeset.db package.

The PSR probesets all start with PSR, and the junction probesets all start with JUC, and do note that these things are supposed to measure different things. The JUC probesets contain probes that span an exon-exon junction, and are intended to measure 'how much' of that junction exists, which is sort of different from the PSR probes, which measure the abundance of a portion of a given exon (and if the exon is short, one PSR may measure abundance of the exon, but if it's long, there may be multiple PSRs for that exon). If you are going to analyze these arrays, it's worth the time to familiarize yourself with the intricacies.

1

I should also note that these arrays have lots of speculative content - only about 55% of the probes are annotated. However, we annotate these things in a pretty bulk way that is NCBI-centric. Basically, I parse out the annotation IDs that are available in the annotation csv files, and then run that through the pipeline in AnnotationForge to make the annotation package. But things like Ensembl transcript IDs, or Havana IDs, etc, are not likely to be annotated.

An alternative way to annotate your data is to use the annotateEset function in my affycoretools package, using the pdInfoPackage for this array. The pdInfoPackage contains the same annotation data, and annotateEset will simply parse out whatever is in there, without being NCBI-centric. So something like

dat <- read.celfiles(list.celfiles())

eset <- rma(dat, target = "probeset")

eset <- annotateEset(eset, annotation(eset))

Will populate the featureData slot of your ExpressionSet, and this will be picked up by limma, and you can also extract using fData if you want to use diffSplice.

ADD REPLYlink written 2.8 years ago by James W. MacDonald51k

Thank you for the useful clarification on the annotation packages. If you don't mind I have a couple follow up questions. I believe the array in question also has exon-intron JUC probes. Would diffSplice in the limma package only work for quantification of exon-exon junctions from PSR probes, or would it be able to incorporate JUC probes as well to call other types of alternative splicing, e.g., intron retention? If not, are there alternative analysis options for quantifying non exon-exon splice isoforms from a combination of JUC and PSR probes? Or would I need to do something like generate a custom transcriptome annotation which specifies the splice isoforms?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by mforde8410
1

Well, if you are willing to consider that the JUC probes are equivalent to the PSR probes in some sense, then diffSplice should work. For example, if you have a exon-intron JUC probe and it's way up in some samples and way down in others, then that's some indication that the intron was retained in some of the samples. In that respect it's probably fine, but I haven't really thought much about it, as our core tries our best to avoid these arrays (as we did with the Exon and HTA arrays).

The main problem with these arrays is probably going to be figuring out what all these probesets are supposed to be measuring. I am not sure that either the pdInfoPackage nor the annotation package really give you a way to figure out what's what. So I would imagine you will have to spend lots of time trying to generate some sort of data resource that tells you what all these things are supposed to measure.

ADD REPLYlink written 2.8 years ago by James W. MacDonald51k