Hi
I have a set of old Agilent microarray gene expression data (e.g. GSE80337 ) and would like to reanalyze the results using an updated annotation. (fyi, the orignal array design was done in such a way that each transcript had 2 to3 probes).
I first mapped the array probes against the new annotation (cDNA) using Bowtie2 (restricting to those probes that have a perfect match). Based on the Bowtie2 output, some of these probes map to multiple genes/IDs (multi-mapping probes).
Now I would like to use the avereps function to group/average the probe signal for each gene (instead of probes individually, as I would like to do some GO enrichment and to "easily" compare with other expression (RNAseq) data).
So my question is, how should I use the avereps function to also include multimapped probes when averaging (e.g. use probe 1 & 2 for NR_003039&NR_0030038/probe 1,3 & 4 for NR_003041 in the Agilent raw file example below)?
Row Col ControlType ProbeName SystematicName(ID)
1: 13 12 0 probe_1 NR_003038, NR 003039, NR_003041
2: 13 53 0 probe_2 NR_003039, NR_0030038
3: 106 26 0 probe_3 NR_003041
4: 108 31 0 probe_4 NR_003041
If I do "MA_average<-avereps(MA_normalized, ID=MA_normalized$genes$SystematicName)" then it ignores the komma separator and considers e.g. "NR_003038, NR 003039, NR_003041" as one ID...
One could say/other posts suggest to remove multimapping probes, but then I exclude those genes that are 100% identical duplicates (and thus only having multimapping probes, NR_0030038/39 in this example)...
Thank you for helping out!
I missed that one of your GeneBank IDs was missing the underscore:
Dear James
Thank you for your reply.
However, I copied (and modified) the raw Agilent file example from another topic (see here).
In my array design for a non-model arthropod species, all probes map to the CDS of a protein coding gene (and not to introns or snoRNAs).
My apologies for the confusion.
Well that's pretty confusing. You copied something that includes some GenBank IDs that are for human, but the array is in fact a spider mite array? OK, whatever. The main point here is that you have a decision to make as the analyst, and you are asking on a site intended for the support of people who are having technical problems with software.
Your question has nothing to do with the software, but instead is a decision you need to make as the analyst. If you are unable to make that decision yourself you should find someone local with experience who can help you. Asking random people on the internet how you should analyze your data is a suboptimal way to proceed.
Dear James
My apologies for the confusion. I did not know that these were well-established human-IDs, but thaught these were random IDs.
However, I do know the decision I want to make. I want to average/group probes, including multi-mapping ones, for each spider mite gene. I just could not figure out how to do it and I thought that this forum, on which many experts (like you and mr. Smyth) and not some random people are active, might help me with my problem.
In that case you can't use
avereps
, but instead will have to use regular R functions to do what you want. So assuming you want an average for each unique GenBank (or whatever) ID you have, something like (untested)Thank you James for the proposed solution, but for some reason R always stops after running the third "mapper" line.
I also do not completely understand the code. You make a factor of each ID in the SystematicName column, but then the use of mapper in the 4th and 5th line I do not understand...
Like I said 'untested'. The first line should be
strsplit
, notsplit
. Here is a trivial example that might be more explanatory: