Use of RMA to get exon-level summaries for HTA 2.0
1
0
Entering edit mode
relathman ▴ 20
@relathman-11472
Last seen 4.4 years ago
Germany

Dear community,
is it possible to use RMA to get exon-level summaries for the HTA 2.0 platform in Bioconductor?

I would like to run diffSplice() from limma to detect genes that have evidence for differential splicing between two conditions and I need an expression matrix with counts at the exon level for that.
I have already used RMA on the probeset/transcript cluster level using the Affymetrix hta20 annotation data (hta20transcriptcluster.db) but I couldn’t find a Bioconductor annotation data package for the exon level for HTA 2.0.

I have also used a custom CDF for HTA 2.0 from Brainarray (which maps transcript cluster IDs to exon IDs) for the FIRMA analysis (method for the detection of alternative splicing from exon array data) as implemented in the Aroma Affymetrix framework.
However, I would like to compare my results gained by the FIRMA analysis with those of an alternative method, preferably with a solution based on Bioconductor packages.

I would be very thankful for any help!

1
Entering edit mode
@james-w-macdonald-5106
Last seen 23 hours ago
United States

Sort of. If you summarize at the 'probeset' level using the oligo package, then you will get probe set region (PSR) and junction probeset (JUC) summaries. These are not exon level, in general, as you can have one or more PSRs per exon, and the JUC probes measure exon-exon boundaries, which by definition are not exon-level.

But if MBNI have some exon-level cdf packages, then I suppose you could summarize using one of those and then use diffSplice. I don't know if Manhong is making pdInfo packages for those re-mapped CDFs, but if not you could hypothetically use the GenericArray pipeline in pdInfoBuilder to make a pdInfo package yourself. But do note that this pipeline isn't well documented, so you will need to read the code and infer what you need to do from what is there.

That would probably be a bit of work, and to what end? What do you expect to gain by making this comparison, and what are your criteria for saying (like, anything) about the similarities and differences between FIRMA and diffSplice? In other words, say you get like 50 genes that look pretty similar between the two methods, and 100 that are completely different (say 50 uniquely differentially spliced genes for FIRMA and diffSplice, respectively). What does that mean? Does that mean the methods agree or disagree? Which one is 'better'? Without knowing which genes are actually being differentially spliced you will probably just get some similarities and some differences and be left wondering what exactly all that means.

0
Entering edit mode

I would have liked to find some of my top candidates for alternatively spliced genes from the FIRMA analysis in the diffSplice results as well without judging the quality of the methods themselves. I assume that as long as we don’t know the identity of the spliced genes beforehand, nothing but experimental testing can really “verify” the use of alternative exons in a sample and till then, every hit can only be a possible candidate.

I would also like to use the plotSplice function from limma for a visualisation of the results.

Could I also use the exon level data from the FIRMA analysis for diffSplice? Is there another method you would recommend for the identification of putative alternatively spliced genes in HTA 2.0? Or would you advise against using this platform – or even microarrays at all – for this purpose?

It seems that most methods for the detection of alternative splicing are based on sequencing nowadays. However, the Affymetrix data sheet for the HTA 2.0 claims that this platform is suitable for the detection of alternatively spliced exons (and even splice-junctions). I know that their TAC software offers some functionality for that but I would have preferred to use a Bioconductor-based solution.

0
Entering edit mode

Any comparisons between the same samples using different methods by definition is only a comparison of the methods. A simplistic example would be to measure your weight using a bathroom scale, and then measuring your weight using one of those scales they have at the doctor's office. Any difference in your weight between the two is only attributable to inherent differences in the scales (the measurement method) rather than anything else. And if they don't agree, what of it? If my bathroom scale says I weigh 175, and the doctor's scale says 170, what is my weight?

Anyway, enough philosophy. I don't use aroma.affymetrix, so I have no idea what form the FIRMA results take, but presuming you get counts/exon then you should be able to feed those data to diffSplice.

As for whether or not the HTA/Clariom D platform is useful for this or not, I have my opinions, and they are pretty strong, and not at all positive. So I'll just make two points. First, there isn't a really credible way to analyze these data the way they are meant to be analyzed. Hypothetically you could synthesize the signal from all the PSRs and JUCs that make up each differentially spliced transcript, and then make comparisons to see if there is any evidence for differential transcript abundance, or exon usage or whatever. Unfortunately, the days when a newly minted PhD statistician could make a name for herself by coming up with a sweet new method for analyzing microarray data are long since gone. All the cool kids are off playing with single cell RNA-Seq data these days, because that's cutting edge technology. Microarrays are so last decade.

Second, for those people who don't have like 5 RO1s, there just isn't enough money to get and analyze enough samples to have power to detect differential transcript abundance (regardless of the platform). You might have a chance at differential exon usage, but even then I wonder. The problem has to do with the fact that as you measure smaller and smaller portions of a gene, the uncertainty in your measurement goes up. And the only way to counteract that is to have more observations. I don't think that the 'usual' experiment with say 3 or 5 samples per group is going to be sufficiently powered to reliably detect any differential splicing. You might luck out and get some signal for a gene or two, but in general I would think that any signal would be drowned in noise.