I have a set of old Agilent microarray gene expression data (e.g. GSE80337 ) and would like to reanalyze the results using an updated annotation. (fyi, the orignal array design was done in such a way that each transcript had 2 to3 probes).
I first mapped the array probes against the new annotation (cDNA) using Bowtie2 (restricting to those probes that have a perfect match). Based on the Bowtie2 output, some of these probes map to multiple genes/IDs (multi-mapping probes).
Now I would like to use the avereps function to group/average the probe signal for each gene (instead of probes individually, as I would like to do some GO enrichment and to "easily" compare with other expression (RNAseq) data).
So my question is, how should I use the avereps function to also include multimapped probes when averaging (e.g. use probe 1 & 2 for NR_003039&NR_0030038/probe 1,3 & 4 for NR_003041 in the Agilent raw file example below)?
Row Col ControlType ProbeName SystematicName(ID)
1: 13 12 0 probe_1 NR_003038, NR 003039, NR_003041
2: 13 53 0 probe_2 NR_003039, NR_0030038
3: 106 26 0 probe_3 NR_003041
4: 108 31 0 probe_4 NR_003041
If I do "MA_average<-avereps(MA_normalized, ID=MA_normalized$genes$SystematicName)" then it ignores the komma separator and considers e.g. "NR_003038, NR 003039, NR_003041" as one ID...
One could say/other posts suggest to remove multimapping probes, but then I exclude those genes that are 100% identical duplicates (and thus only having multimapping probes, NR_0030038/39 in this example)...
Thank you for helping out!