Question

Resummarisation of Exon array normalised at probeset level to gene level

0

Entering edit mode

i.sudbery ▴ 40

@isudbery-8266

Last seen 9 months ago

European Union

We wish to analyse an Exon Array dataset we obtained from a public source (unfortunately not GEO). The data we have is a matrix of RMA normalised expression values from some 400 Exon arrays summarized at the probeset level. We are only interested in the gene level and wondered if there is any way to summarize to the gene level from this starting point?

normalization oligo limma • 1.3k views

ADD COMMENT • link updated 10.0 years ago by James W. MacDonald 68k • written 10.0 years ago by i.sudbery ▴ 40

score 0 · Answer 1 · 2015-06-30

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

The short answer is no. To summarize at the gene level using RMA requires the probe-level data. You could hypothetically group all the probesets for a given gene together and then summarize in some fashion, but the resulting values would not be the same as what you would get if you summarized using RMA.

ADD COMMENT • link 10.0 years ago James W. MacDonald 68k

0

Entering edit mode

Do you think if I took the mean of probes for each gene, that the resulting values would be valid for downstream limma analysis?

ADD REPLY • link 10.0 years ago i.sudbery ▴ 40

0

Entering edit mode

I actually have no idea. It's certainly one thing you can do, and it might not be the worst idea in the world, but ideally you would do some conventional EDA (exploratory data analysis) first to see if it looks like taking means is a reasonable thing to do.

An alternative would be to make comparisons at the probeset (exon-ish) level, and look for consistent differences over the set of probesets for each gene. The downsides to that approach are that the probesets only have (usually) four probes each, and the Exon arrays are much dimmer than the old 3'-biased arrays, so you have to wonder about the signal to noise ratio with just four dim probes per probeset. You also increase the multiplicity burden quite a bit, which will not help things at all.

Ideally you would go back to whomever submitted the data, and they would be oh so happy to supply you with the original celfiles. Is that in the cards?

ADD REPLY • link 10.0 years ago James W. MacDonald 68k

0

Entering edit mode

I'm going to ask. But the data comes from a massive consortium, and has been around for some time without being uploaded to GEO or similar, or even published. They have lots of data sets I'd like to get my hands on, but only make summarized versions of all of them availible, which is annoying because I'd really like to study genes that they have excluded from their analyses.

ADD REPLY • link 10.0 years ago i.sudbery ▴ 40