avereps(): 'ProbeName' or 'SystematicName' for agilent one-channel microarray?
2
0
Entering edit mode
kevin.m.hao ▴ 10
@kevinmhao-12183
Last seen 5.0 years ago

Hello,

I am using Limma to analyze one-channel agilent microarray data from NCBI GEO (GSE57296) from raw data.

In the step of avereps(), I notice that one can

yave <- avereps(y0, ID = y0$genes[, "SystematicName"]) # "SystematicName" OR yave <- avereps(y0, ID = y0$genes[, "ProbeName"]) # "ProbeName"

But, the results are different, since one SystematicName may consist of many ProbeNames like the following one:

> G[SystematicName == "NR_003038"]
Row Col ControlType      ProbeName SystematicName
1:  13  12           0 A_19_P00316659      NR_003038
2:  13  53           0 A_19_P00317984      NR_003038
3: 106  26           0 A_19_P00319019      NR_003038
4: 132 161           0 A_19_P00322944      NR_003038
5: 133  78           0 A_19_P00321546      NR_003038
6: 139  38           0 A_19_P00316419      NR_003038
7: 152  79           0 A_19_P00322702      NR_003038
8: 155   8           0 A_19_P00322754      NR_003038
9: 161  13           0 A_19_P00317178      NR_003038
10: 224  48           0 A_19_P00321511      NR_003038
11: 231  52           0   A_23_P361085      NR_003038
12: 245  37           0 A_19_P00316541      NR_003038
13: 270   4           0 A_19_P00319095      NR_003038
14: 301  13           0 A_19_P00316701      NR_003038
15: 319  28           0 A_19_P00317473      NR_003038
16: 331   7           0 A_19_P00320094      NR_003038
17: 347 162           0 A_19_P00322666      NR_003038
18: 384  73           0 A_19_P00317743      NR_003038

So, which one should be used in avereps()?  ProbeName? OR SystematicName?

I noticed that Limma userguide used the "SystematicName", but http://matticklab.com/index.php?title=Single_channel_analysis_of_Agilent_microarray_data_with_Limma using "ProbeName".

I think "SystematicName" is better, bu not sure, can you give me some help to clear this?

Thanks.

Kevin

limma microarray • 1.8k views
4
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States

The probe name is just that - the internal name given to a particular probe by the manufacturer. The systematic name is the transcript that a given probe is intended to measure. If you simply want to take the average of all duplicate probes on the array, then you should use the probe name. If you want to 'collapse' the information to individual transcripts/genes, then you should use the systematic name.

Which is 'better' depends on what you are trying to do.

0
Entering edit mode

Thanks James. But what case should one take the average of all duplicate probes? If I understand right, in most situations, one would like to get the differentially expressed genes (DEG) no the probes, right? Even one get the DE probes, they should map these probes to genes, so it is more direct to use 'SystematicName' to get DEG, right. Thanks!

1
Entering edit mode

Well, the duplicate probes are measuring the exact same thing, and thus are true technical replicates. The set of probes that are intended to measure the transcript(s) from a particular gene are less so, and may in fact be intended to measure different splice variants.

There is very little to be gained from repeated measurements of the same thing, so you could argue that averaging the duplicate probes is a reasonable thing to do. You could also argue that the different probes (that may not be identical, and might measure different transcripts) are just measuring the amount of transcript from each gene, and if you don't care about the differences in the transcripts being measured (which may not be that different anyway), then it's reasonable to collapse those measurements to a single mean value.

Part of analyzing data involves making these sorts of decisions, and being able to explain what you did and why you did it. I can give you hypothetical arguments as to why one might want to do this or that, but in the end it's your analysis, and you will have to be responsible for what you did, and you will have to explain (to someone) what you did and why.

0
Entering edit mode

Or can one first use "ProbeName" to averge the duplicated probes and then use "SystematicName" to focus one the genes/transcripts? That is a two-step procedure. Is it reasonable to do this?

0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Personally, I do not usually average replicate probes. You should only do so if you have a particular reason for needing to do so. Otherwise the limma analysis will work perfectly well without averaging.

You haven't explained any reason why you want or need to average replicate probes, so the default position is not to do it.

0
Entering edit mode

Hi Gordon,

If one does not average replicate probes, then limma does identify the Differentially Expressed Probes (DEP), right?

After obtaining DEP, how to transform to Differentially Expressed Genes (DEG) from these DEPs?

Thanks.

0
Entering edit mode

If any of the probes associated with a gene is DE, then the gene is DE. There's no need to do any transformation. Why is that a problem for you?

0
Entering edit mode

Dear Gordon

I have a "similar" problem as Kevin. I would like to average (non-duplicate) probes/gene using the avereps function for a dual color Agilent array.

However, I have probes that map to multiple genes (identically duplicated). In the case of Kevin it would look like this.

Row Col ControlType      ProbeName SystematicName

1:  13  12           0 A_19_P00316659      NR_003038, NR 003039, NR_003040
2:  13  53           0 A_19_P00317984      NR_003039
3: 106  26           0 A_19_P00319019      NR_003038, NR_003042

I would like to keep these multimapping probes, because if I restrict to probes that are uniquely mapping, then some (identically duplicated) genes are thrown out the analysis...

So how should I use the avereps function to also include multimapped probes when averaging (e.g. use probe 1 & 2 for NR_003039 and probe 1 & 3 for NR_003038)?

If I do "MA_average<-avereps(MA_normalized, ID=MA_normalized$genes$SystematicName)" then it ignores the komma separator and considers e.g. "NR_003038, NR 003039, NR_003040" as one ID...

Thank you for helping out!

0
Entering edit mode

This question is different because of the multi-mapping probes, so you should ask a new question on this forum instead of appending a comment to an old question.

When you post your own question, please explain why your platform has multi-mapping probes and what analysis you plan to do that requires you to have gene level results. Why can't you follow the same advice I gave to Kevin, which is to do the probe-level analysis and just take a gene to be DE if any of the probes belonging to the gene are DE?

0
Entering edit mode

Dear Gordon

Thank you for your time, and reply. I now posted a new question on the Bioconductor form.

You can find it here: avereps with probes mapping to multiple genes for an Agilent dual color microarray.